我最近剛剛將批量數據存儲池(ZFS On
Linux 0.6.2,Debian Wheezy)從單設備vdev配置遷移到雙向鏡像vdev配置.
之前的池配置是:
NAME STATE READ WRITE CKSUM
akita ONLINE 0 0 0
ST4000NM0033-Z1Z1A0LQ ONLINE 0 0 0
在重新啟動完成后一切都很好(我在重新啟動完成后啟動了一個擦除,只是為了讓系統再次檢查所有內容并確保它一切都很好):
pool: akita
state: ONLINE
scan: scrub repaired 0 in 6h26m with 0 errors on Sat May 17 06:16:06 2014
config:
NAME STATE READ WRITE CKSUM
akita ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ST4000NM0033-Z1Z1A0LQ ONLINE 0 0 0
ST4000NM0033-Z1Z333ZA ONLINE 0 0 0
errors: No known data errors
但是,重新啟動后,我收到一封電子郵件,通知我這個游泳池不是很好,花花公子.我看了看,這就是我所看到的:
pool: akita
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-4J
scan: scrub in progress since Sat May 17 14:20:15 2014
316G scanned out of 1,80T at 77,5M/s, 5h36m to go
0 repaired, 17,17% done
config:
NAME STATE READ WRITE CKSUM
akita DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
ST4000NM0033-Z1Z1A0LQ ONLINE 0 0 0
ST4000NM0033-Z1Z333ZA UNAVAIL 0 0 0
errors: No known data errors
擦洗是預期的;有一個cron作業設置,可以在重新啟動時啟動完整系統清理.但是,我絕對沒想到新硬盤會從鏡子里掉出來.
我定義映射到/ dev / disk / by-id / wwn- *名稱的別名,如果這兩個磁盤都給予ZFS免費統治以使用完整磁盤,包括處理分區:
# zpool history akita | grep ST4000NM0033
2013-09-12.18:03:06 zpool create -f -o ashift=12 -o autoreplace=off -m none akita ST4000NM0033-Z1Z1A0LQ
2014-05-15.15:30:59 zpool attach -o ashift=12 -f akita ST4000NM0033-Z1Z1A0LQ ST4000NM0033-Z1Z333ZA
#
這些是來自/etc/zfs/vdev_id.conf的相關行(我現在注意到Z1Z333ZA使用制表符分隔,而Z1Z1A0LQ行只使用空格,但老實說我看不出這里有什么相關性) :
alias ST4000NM0033-Z1Z1A0LQ /dev/disk/by-id/wwn-0x5000c500645b0fec
alias ST4000NM0033-Z1Z333ZA /dev/disk/by-id/wwn-0x5000c50065e8414a
當我看,/ dev / disk / by-id / wwn-0x5000c50065e8414a *如預期那樣,但是/ dev / disk / by-vdev / ST4000NM0033-Z1Z333ZA *沒有.
發出sudo udevadm觸發器導致符號鏈接顯示在/ dev / disk / by-vdev中.然而,ZFS似乎并沒有意識到他們在那里(Z1Z333ZA仍顯示為UNAVAIL).我認為可以預料到這一點.
我嘗試更換相關設備,但沒有真正的運氣:
# zpool replace akita ST4000NM0033-Z1Z333ZA
invalid vdev specification
use '-f' to override the following errors:
/dev/disk/by-vdev/ST4000NM0033-Z1Z333ZA-part1 is part of active pool 'akita'
#
在引導過程中檢測到兩個磁盤(顯示相關驅動器的dmesg日志輸出):
[ 2.936065] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 2.936137] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 2.937446] ata4.00: ATA-9: ST4000NM0033-9ZM170, SN03, max UDMA/133
[ 2.937453] ata4.00: 7814037168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[ 2.938516] ata4.00: configured for UDMA/133
[ 2.992080] ata6: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 3.104533] ata6.00: ATA-9: ST4000NM0033-9ZM170, SN03, max UDMA/133
[ 3.104540] ata6.00: 7814037168 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[ 3.105584] ata6.00: configured for UDMA/133
[ 3.105792] scsi 5:0:0:0: Direct-Access ATA ST4000NM0033-9ZM SN03 PQ: 0 ANSI: 5
[ 3.121245] sd 3:0:0:0: [sdb] 7814037168 512-byte logical blocks: (4.00 TB/3.63 TiB)
[ 3.121372] sd 3:0:0:0: [sdb] Write Protect is off
[ 3.121379] sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[ 3.121426] sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 3.122070] sd 5:0:0:0: [sdc] 7814037168 512-byte logical blocks: (4.00 TB/3.63 TiB)
[ 3.122176] sd 5:0:0:0: [sdc] Write Protect is off
[ 3.122183] sd 5:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[ 3.122235] sd 5:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
兩個驅動器都直接連接到主板;沒有涉及車外控制器.
一時沖動,我做了:
# zpool online akita ST4000NM0033-Z1Z333ZA
似乎有效; Z1Z333ZA現在至少是在線和重新調整.在大約一個小時進入彈性器后,它掃描180G并重新啟動24G,完成9.77%,這表明它沒有完全恢復,而只是傳輸數據集增量.
老實說,我不確定這個問題是否與ZFS On Linux或udev有關(它有點像udev,但是為什么一個驅動器被檢測得很好而不是另一個),但我的問題是如何制作確定在下次重啟時不會再發生同樣的事情?
如果有必要,我很樂意提供更多關于設置的數據;只是讓我知道需要什么.