By Toradex秦海
1).?簡介
目前工業嵌入式?ARM?平臺最常用的存儲器件就是?eMMC Nand Flash?存儲,而由于工業設備一般生命周期都比較長,eMMC?存儲器件的磨損壽命對于整個設備來說至關重要,因此本文就基于?NXP i.MX8M Mini ARM?處理器平臺演示?eMMC?器件磨損測試的示例流程。
關于?eMMC?存儲器件的基本介紹可以參考如下文章,eMMC?存儲器件通常包含有?eMMC Nand Flash?控制器和一定數量的?Nand Flash?存儲顆粒來組成,ARM?處理器主機對于?eMMC?的操作都要通過?Nand Flash?控制器進行映射,同時?Nand Falsh?控制器還負責?Wear leveling/ECC/Bad Block Management?等功能以保證?eMMC?器件穩定可靠工作。
eMMC (Linux) | Toradex Developer Center
eMMC?存儲器件的磨損壽命主要由其包含的?Nand Flash?顆粒存儲單元的?P/E(programming and erasing)?次數來決定,不同?Nand Flash?顆粒種類通常的?P/E?次數不同,一個大概的參考如下,不同品牌不同工藝的顆粒會有差異。
./ SLC Nand Flash - 10K - 100K P/E Cycles
./ MLC Nand Flash - 3K - 10K, normally 3K P/E Cycles
./ 2D TLC Nand Flash - normally 1K P/E Cycles
./ 3D TLC Nand Flash - normally 3K P/E Cycles
但是由于?Nand?控制器操作?Nand Flash?存儲單元?programming?寫入最小單位是?Page,而?erasing?擦除最小單位是?Block,因此當寫入/擦除數據不是對應最小單元整數倍時候就會產生額外的開銷,同時還附加其他?Wear leveling/Garbage collection/Bad Block Management?等功能產生的開銷,就會導致實際寫入的全壽命數據量要小于理論上按照單元?P/E Cycles?計算的數據量(eMMC capacity * P/E cycles),這個差異就是?WAF?(Write Amplification Factor)寫放大因子((eMMC capacity * P/E cycles)/actual full-lifetime data written)。更多相關說明請參考如下文章。
使用 eMMC 閃存設備的磨損估計
因此由于上述?Nand Flash?控制器地址映射和?WAF?的存在,磨損測試是無法直接將?Host?寫入數據和實際?Nand Flash?顆粒的?P/E?對應的,而?WAF?在不同寫入情況下又是一個動態數值,所以我們依賴?Linux Kernel mmc-utils?工具或者?eMMC?提供商的專用軟件來讀取?Extended CSD rev 1.7 (MMC 5.0)?包含的?Health Status?信息,并通過其每?10%?的線性變化和實際寫入數據是否對應線性變化,以及最終寫入數據量,可以推算出實際的?WAF。
eMMC (Linux) | Toradex Developer Center
關于?CSD Register?中?Health Status?和?Spare Block Register?的定義說明如下
./?Device life time estimation type?A/B: life time estimation?based on blocks P/E cycles, provided in steps of 10%, e.g.:
0x02 means 10%-20% device life time used.
./?Pre EOL information: overall status for reserved blocks. Possible values are:
0x00 - Not defined.
0x01 - Normal: consumed less than 80% of the reserved blocks.
0x02 - Warning: consumed 80% of the reserved blocks.
0x03 - Urgent: consumed 90% of the reserved blocks.
本文所示例的平臺來自于?Toradex?Verdin?i.MX8MM?嵌入式平臺。
2.?準備
a).?Verdin i.MX8MM ARM核心版配合Dahlia?載板并連接調試串口用于后續測試
b).參考這里下載?Toradex Yocto Linux BSP6 Reference Image
c).參考這里的說明將上述下載的?BSP Image?安裝到?Verdin i.MX8MM?核心板。
d).準備一個?SD?卡,參考這里的說明使用上述下載的?BSP Image?制作啟動?SD?卡。
3).?測試流程
a).?將?SD?插入?Dahlia?載板后啟動,系統自動會優先從外部?SD?卡?(mmc1)?啟動,可以通過如下調試串口?log?信息來進一步判定。
-------------------------------
......
Hit any key to stop autoboot: ?0
switch to partitions #0, OK
mmc1 is current device
Scanning mmc 1:1...
Found U-Boot script /boot.scr
......
-------------------------------
b).?因為系統會自動?mount eMMC?對應設備分區,為了后續測試,需要先關閉自動掛載。
-------------------------------
root@verdin-imx8mm-07276322:~# mount |grep /dev/mmcblk0
/dev/mmcblk0p2 on /media/RFS-mmcblk0p2 type ext4 (rw,relatime)
/dev/mmcblk0p1 on /media/BOOT-mmcblk0p1 type vfat (rw,relatime,gid=6,fmask=0007,dmask=0007,allo)
-------------------------------
在設備?Linux?下執行下面腳本關閉自動掛載,執行成功后上述掛載信息就沒有了。
-------------------------------
#!/bin/sh -e
systemd-umount /dev/mmcblk0p1
systemd-umount /dev/mmcblk0p2
systemctl stop systemd-udevd
systemctl stop systemd-remount-fs
count=`ls -1 /etc/udev/rules.d/*automount.rules 2>/dev/null |wc -l`
if [ $count != 0 ]
then
rm /etc/udev/rules.d/*automount.rules
fi
-------------------------------
c).?接下來要通過?Linux?磁盤操作工具來進行大量寫入數據來測試?eMMC?的磨損,?本文測試使用?fio?工具,當然還有像?dd/hdparm?等工具也可以根據情況酌情選擇。
./?首先創建?fio?配置文件,類似如下,具體說明可以參考?fio?官方文檔。
-------------------------------
[global]
bs=32k
direct=0
ioengine=libaio
iodepth=4
verify=crc32c
filename=/dev/mmcblk0 ; emmc device filename
verify_dump=1
verify_fatal=1
randrepeat=0
[write-once]
description=Write once area, used for testing date retention
stonewall
rw=write
verify_pattern=0xaa555aa5 ; fixed data pattern
size=256M
offset=0
[verify-write-once]
description=Verify write once area, used for testing data retention
stonewall
rw=read
verify_only
size=256M
offset=0
[write]
description=Write r/w stress data area with random data
stonewall
rw=write
do_verify=0
offset=256M
[verify]
description=Verify r/w stress data area
stonewall
rw=read
verify_only
offset=256M
-------------------------------
//其中需要說明的是?bs (block size)?的設置需要根據不同的?eMMC?手冊中定義的?Optimal Write Size?以盡可能減小?WAF,比如當前測試?eMMC?手冊中定義如下
實際讀取的寄存器數值如下,對應為?32KB,因此?fio?配置文件中?bs?參數設置為?32k?或者其整數倍數,可以保證?Nand Flash?顆粒存儲單元寫入都是按照?Page Size。
-------------------------------
$?mmc extcsd read /dev/mmcblk0 ?| grep write
Optimal write size [OPTIMAL_WRITE_SIZE: 0x08]
-------------------------------
./?然后可以通過類似如下測試腳本來進行一次寫入和驗證,測試?fio?的配置正確和可用以及當前的?eMMC Health Status?狀態
-------------------------------
#!/bin/bash -e
EMMC_DEVICE=/dev/mmcblk0
FIO_TEST_NAME=emmc-pe-test.fio
echo ">> eMMC P/E test preparation on ${EMMC_DEVICE}"
echo ">> eMMC EXTCSD Health Status"
mmc extcsd read "${EMMC_DEVICE}" | fgrep -A1 DEVICE_LIFE_TIME_EST
mmc extcsd read "${EMMC_DEVICE}" | fgrep -A1 PRE_EOL_INFO
echo ">> Write once data"
fio --section=write-once "${FIO_TEST_NAME}"
echo ">> Verify write once data"
fio --section=verify-write-once "${FIO_TEST_NAME}"
-------------------------------
./?最后可以通過如下循環寫入腳本持續寫入測試來測試?eMMC?磨損情況。
-------------------------------
#!/bin/bash -e
EMMC_DEVICE=/dev/mmcblk0
COUNT=0
FIO_TEST_NAME=emmc-pe-test.fio
echo ">> Starting eMMC P/E test on ${EMMC_DEVICE}"
while true
do
????????echo ">> Run $COUNT"
????????echo ">> eMMC EXTCSD Health Status"
????????mmc extcsd read "${EMMC_DEVICE}" | fgrep -A1 DEVICE_LIFE_TIME_EST
????????mmc extcsd read "${EMMC_DEVICE}" | fgrep -A1 PRE_EOL_INFO
????????echo ">> Check write once data"
????????fio --section=verify-write-once "${FIO_TEST_NAME}"
????????echo ">> Wear eMMC"
????????fio --section=write --section=verify "${FIO_TEST_NAME}"
????????COUNT=$(($COUNT + 1))
done
-------------------------------
./?磨損測試一次全盤寫入和驗證的?log?信息如下,由于實際測試完成時間會非常長,通常根據?eMMC?容量不同可能需要幾天甚至十幾天時間,本文就不演示最終完成的數據。最后可以根據壽命達到90%以上時候全部?log?信息統計出類似如下表格?eMMC?每磨損?10%?實際?P/E?的次數和數據量,得出?eMMC?的全壽命磨損數據/磨損是否線性以及實際?WAF?數值。另外,關于?LIFE_TIME_EST_A?還是?LIFE_TIME_EST_B?沒有標準定義,由各個廠商自行定義,所以實際以廠商定義為準。
-------------------------------
>> Starting eMMC P/E test on /dev/mmcblk0
>> Run 0
>> eMMC EXTCSD Health Status
Device life time estimation type B [DEVICE_LIFE_TIME_EST_TYP_B: 0x01]
?i.e. 0% - 10% device life time used
Device life time estimation type A [DEVICE_LIFE_TIME_EST_TYP_A: 0x01]
?i.e. 0% - 10% device life time used
Pre EOL information [PRE_EOL_INFO: 0x01]
?i.e. Normal
>> Check write once data
verify-write-once: (g=0): rw=read, bs=(R) 32.0KiB-32.0KiB, (W) 32.0KiB-32.0KiB, (T) 32.0KiB-32.4
fio-3.30
Starting 1 process
Jobs: 1 (f=1)
verify-write-once: (groupid=0, jobs=1): err= 0: pid=583: Fri Apr 29 20:04:38 2022
??Description ?: [Verify write once area, used for testing data retention]
??read: IOPS=4908, BW=153MiB/s (161MB/s)(256MiB/1669msec)
...
Run status group 0 (all jobs):
???READ: bw=153MiB/s (161MB/s), 153MiB/s-153MiB/s (161MB/s-161MB/s), io=256MiB (268MB), run=166c
Disk stats (read/write):
??mmcblk0: ios=1009/0, merge=0/0, ticks=2390/0, in_queue=2391, util=94.47%
>> Wear eMMC
write: (g=0): rw=write, bs=(R) 32.0KiB-32.0KiB, (W) 32.0KiB-32.0KiB, (T) 32.0KiB-32.0KiB, ioeng4
verify: (g=1): rw=read, bs=(R) 32.0KiB-32.0KiB, (W) 32.0KiB-32.0KiB, (T) 32.0KiB-32.0KiB, ioeng4
fio-3.30
Starting 2 processes
Jobs: 1 (f=1): [_(1),V(1)][100.0%][eta 00m:00s] ??????????????????????????????????
write: (groupid=0, jobs=1): err= 0: pid=590: Fri Apr 29 20:17:15 2022
??Description ?: [Write r/w stress data area with random data]
??write: IOPS=732, BW=22.9MiB/s (24.0MB/s)(14.4GiB/642435msec); 0 zone resets
...
verify: (groupid=1, jobs=1): err= 0: pid=607: Fri Apr 29 20:17:15 2022
??Description ?: [Verify r/w stress data area]
??read: IOPS=4812, BW=150MiB/s (158MB/s)(14.4GiB/97725msec)
...
Run status group 0 (all jobs):
??WRITE: bw=22.9MiB/s (24.0MB/s), 22.9MiB/s-22.9MiB/s (24.0MB/s-24.0MB/s), io=14.4GiB (15.4GB),c
Run status group 1 (all jobs):
???READ: bw=150MiB/s (158MB/s), 150MiB/s-150MiB/s (158MB/s-158MB/s), io=14.4GiB (15.4GB), run=9c
Disk stats (read/write):
??mmcblk0: ios=58819/29449, merge=0/3732727, ticks=143387/81519893, in_queue=81663280, util=99.%
...
-------------------------------
4).?總結
本文基于?NXP i.MX8MM ARM?處理器平臺說明和演示了?eMMC?壽命磨損測試的流程,同時由于測試是線性寫入,得出的結果和實際應用具體情況可能有不同,不過在實際應用中,為了最大程度的增加?eMMC?存儲器件的壽命和可靠性,在寫入數據時候最好不要無論大小數據每次都直接寫入磁盤,最好使用緩存將要寫入的數據累積到一定量之后,根據具體?eMMC Optimal Write Size?來最終寫入磁盤,以盡可能減少?WAF,提高磨損壽命。