V 8 nfs+drbd+heartbeat

nfs+drbd+heartbeat，nfs或分布式存儲mfs只要有單點都可用此方案解決

在企業實際生產場景中，nfs是中小企業最常用的存儲架構解決方案之一，該架構方案部署簡單、維護方便，只需通過配inotify+rsync簡單而高效的數據同步方式就可實現對nfs存儲系統的數據進行異機主從同步及類似MySQL的rw splitting，且多個從（讀r）還可通過LVS或haproxy實現LB，既分擔大并發讀數據的壓力又排除了從的單點故障；

web server上的存儲：

方案一（讀寫在任意一臺web server上都行，通過inotify+rsync將每個web server上的數據同步至其它web server，例如web1-->web2-->web3-->web2-->web1）；

方案二（在LB器上配置，在寫（上傳文件）時只能到web3上，r在web｛1，2｝上，使用inotify+rsync在同步時web3-->web2、web3-->web1）；

方案三（使用共享存儲nfs，若只一個rw都在一個上就成了單點，再加一臺，一主一備，彼此間使用inotify+rsync同步數據，這兩個可rw都放到一個上，另一個僅用來備份數據，也可以讀在備上寫在主上；一般讀多寫少，可再加一臺作為備，一主多從，減輕r的壓力，同步數據時主-->備1、主-->備2；若某一個存儲故障，web{1,2,3}要重新掛載，備掛掉一個不影響，主若掛掉則不可寫；于是給主做高可用master-active和master-inactive，這兩臺主同一時間僅一臺對外提供服務，master-inactive為不活動狀態僅在切換后才對外提供服務）；

方案四（棄用nfs共享存儲，使用master-active和master-inactive，只將數據寫到共享存儲上，再返回來將共享存儲的數據同步到web{1,2,3}的本地，讀時直接從本地拿；

一主多從模型中，若要實現當主掛掉時仍可寫，且可繼續同步到從，用nfs+drbd+heartbeat實現主的高可用，解決主單點問題，當master-activenfs故障切至master-inactive nfs上，這兩主的數據是一致的，master-inactivenfs會自動與其它所有從nfs進行同步，從而實現nfs存儲系統熱備

master-active故障切至備master-inactive上時，備node要仍能向nfs slave同步數據，此時同步就不能全部同步而要僅同步切換后變更的數據，此處可用sersync代替inotify，通過sersync的-r選項（或者也可以先不讓inotify啟動，待備node的heartbeat啟動并掛載好之后，再開啟inotify服務）

注：此方案與MFS、FastDFS、GFS等FS相比，部署簡單、維護控制方面較容易，符合簡單易高效的原則；但也有缺陷，每個node都是全部的數據（和MySQL同步一樣），大數據量文件同步很可能有數據延遲發生，可根據不同的數據目錄進行拆分同步（類似MySQL的庫表拆分方案），對于延遲也可開啟多個同步實例并通過程序實現控制讀寫邏輯，還要能監測到同步狀態

nfs高可用方案，解決在兩主node在切換時，nfsslave讀不到數據卡死狀態，可從以下幾方面入手：

rpcbind服務要一直確保開啟（主node、備node、nfs客戶端都要開啟）；

nfs client（nfs slave）監控本地已掛載的nfs共享目錄，如果發現讀不了，執行重新掛載；

nfs client監控master-inactivenode是否有VIP出現或者drbd的狀態變為Primary，如果有執行重新掛載（nfs服務切換時通過SSH等機制nfs client實現remount；利用nagios監控，如果master inactive node出現VIP，執行一個指腳本進行多臺nfsclient的remount）；

如圖：橢圓標注是此節操作的內容

注：單臺server，無需文件存儲，數據放本地，只有做集群的情況下才需要做專門的存儲

注：問題（單點，rw都在一個上性能不好，企業中做運維要考慮的（數據保護；7*24小時持續服務）

注：

web1和web2一般用LNMP；

IMG1和IMG2一般nginx或lighttpd；

該方案既解決nfs master單點，又解決了并發讀性能問題，但如果數據寫并發持續加大，會導致如下問題：

適用于200-300張/s上傳的圖片，并發同步效率方面還可以，若高于300張/s可能導致master和slave同步延遲，解決辦法：開啟多線程同步，優化監控事件、磁盤IO、網絡IO；

若IMG server很多的情況下，只有一臺master，master既負責寫，又負責給多臺同步數據，壓力會很大；

圖片問題非常大時，每個node都是全部完整的數據，若總容量3T以上，可能導致單臺server存儲空間不夠，解決辦法：（一、可利用MySQL拆庫的思路解決容量、寫性能、同步延遲的問題，例如初期規則img1--img5，5個目錄對應5個域名，掛載這5個目錄，每個imgNUM變為一組新的nfs主從高可用及rw spltting的集群，rw splitting可用POST或webDAV的方式）；（二、通過DNS擴展多主的架構，增加新的服務意味著單點）；（三、利用MySQL、Oracle、Mongodb、cassandra等數據庫的內部功能實現文件數據的同步，愛奇藝用mongodb的GridFS做圖片存儲）；

注：mongodb的GridFS做圖片存儲（支持分布式，設計思路：圖片存儲唯一；只存原始圖；首次請求生成縮略圖并生成靜態文件；url固定，根據不同url產生縮略圖；參考Abusing Amazon p_w_picpaths

注：facebook圖片管理架構

注：

給nfs做HA解決了單點，浪費了一臺server；

nfs兩主之間是通過heartbeat+drbd，采用drbd的C協議實時同步；

nfs(M)和nfs(S)之間通過inotify+rsync異步同步，nfs(S)通過VIP與nfs(M)進行同步，nfsslaveNUM用來讀，nfs master用來寫，這解決了并發讀性能問題；也可將nfs master只寫，再由nfs master推至appserver（棄用nfs方案）；

物理磁盤做RAID10或RAID0根據性能和冗余需求來選擇；server之間、server和switch是用雙千兆網卡bounding綁定；應用server（包括不限于web）通過VIP訪問nfs(M)，通過不同的VIP訪問LB的nfs(S)存儲池；nfs(M)的數據在drbd的分區中；

在數據量不大的情況下，可將直接將數據從nfs(M)上直接同步至appserver本地，讀全都從appserver本地讀取，寫要到nfs(M)上；

用inotify+rsync做從master--slave同步時，在并發寫大的情況下會導致數據延遲或不同步；

注：

在企業實際工作場景中，只有萬不得已才會去搞DB和文件存儲的問題，平時應多在網站架構上做調整，以讓用戶請求最小化的訪問DB及存儲系統，例如做文件緩存和數據緩存（高并發的核心原則：把所有的用戶訪問請求都盡量往前推），而不是上來就搞分布式存儲系統，對于中小企業用分布式存儲就是大炮打蚊子，2012年facebook已經很大的時候還是用nfs存儲系統（分布式不是萬能的，會消耗大量的人力、物力，控制不好會帶來災難的后果）

注：

為緩解網站訪問的壓力，盡量將user訪問的內容往前推，有放到user本地的就不要放到CDN，能放到CDN的就不要放到本地server，充分利用每一層的緩存，直到萬不得已才讓用戶訪問后端的DB，在此基礎上若撐不住，解決辦法：使用ssd+sata，還不行使用分布式存儲

1、安裝配置heartbeat

準備環境：

VIP：10.96.20.8

master：eth0（10.96.20.113）、eth1（172.16.1.113，不配網關及dns）、主機名（test-master）

backup：eth0（10.96.20.114）、eth1（172.16.1.114，不配網關及dns）、主機名（test-backup）

雙網卡、雙硬盤、

注：eth0為管理IP；eth1心跳連接及drbd傳輸通道，若是生產環境中心跳傳輸和數據傳輸用一個網卡要做限制，給心跳留有帶寬

注：規范vmware中標簽，Xshell中標簽，公司中的生產環境所有主機均應在/etc/hosts文件中有相應記錄，方便分發及管理維護

test-master（分別配置主機名/etc/sysconfig/network結果一定要與uname-n保持一致，/etc/hosts文件，ssh雙機互信，時間同步，iptables，selinux）：

[root@test-master ~]# cat /etc/redhat-release

Red Hat Enterprise Linux Server release 6.5(Santiago)

[root@test-master ~]# uname -rm

2.6.32-431.el6.x86_64 x86_64

[root@test-master ~]# uname -n

test-master

[root@test-master ~]# ifconfig | grep eth0 -A 1

eth0?????Link encap:Ethernet? HWaddr00:0C:29:1F:B6:AC?

?????????inet addr:10.96.20.113?Bcast:10.96.20.255?Mask:255.255.255.0

[root@test-master ~]# ifconfig | grep eth1 -A 1

eth1?????Link encap:Ethernet? HWaddr00:0C:29:1F:B6:B6?

?????????inet addr:172.16.1.113?Bcast:172.16.1.255?Mask:255.255.255.0

[root@test-master ~]# routeadd -host 172.16.1.114 dev eth1??#（添加主機路由，心跳傳送通過指定網卡出去，此句可追加到/etc/rc.local中，也可配置靜態路由#vim /etc/sysconfig/network-scripts/route-eth1添加172.16.1.114/24via 172.16.1.113）

[root@test-master ~]# ssh-keygen-t rsa -f ./.ssh/id_rsa -P ''

Generating public/private rsa key pair.

Your identification has been saved in./.ssh/id_rsa.

Your public key has been saved in./.ssh/id_rsa.pub.

The key fingerprint is:

29:c3:a3:68:81:43:59:2f:0a:ad:8a:54:56:b0:1e:12root@test-master

The key's randomart p_w_picpath is:

+--[ RSA 2048]----+

| E o..??????????|

| .+ +???????????|

|.+.* .??????????|

|oo* o.??.?????? |

|+o..? =S??????? |

|+. o . +????????|

|o o .???????????|

| .???????? ??????|

|????????????????|

+-----------------+

[root@test-master ~]# ssh-copy-id-i ./.ssh/id_rsa root@test-backup

The authenticity of host 'test-backup(10.96.20.114)' can't be established.

RSA key fingerprint is63:f5:2e:dc:96:64:54:72:8e:14:7e:ec:ef:b8:a1:0c.

Are you sure you want to continue connecting(yes/no)? yes

Warning: Permanently added 'test-backup' (RSA) tothe list of known hosts.

root@test-backup's password:

Now try logging into the machine, with "ssh'root@test-backup'", and check in:

?.ssh/authorized_keys

to make sure we haven't added extra keys that youweren't expecting.

[root@test-master ~]# crontab -l

*/5 * * * * /usr/sbin/ntpdate time.windows.com&> /dev/null

[root@test-master ~]# service crond restart

Stopping crond:?????????? ?????????????????????????????????[? OK? ]

Starting crond:???????????????????????????????????????????[? OK? ]

[root@test-master ~]# wget http://mirrors.ustc.edu.cn/fedora/epel/6/x86_64/epel-release-6-8.noarch.rpm

[root@test-master ~]# rpm -ivh epel-release-6-8.noarch.rpm

warning: epel-release-6-8.noarch.rpm: Header V3RSA/SHA256 Signature, key ID 0608b895: NOKEY

Preparing...???????????????########################################### [100%]

??1:epel-release??????????########################################### [100%]

[root@test-master ~]# yum search heartbeat

……

heartbeat-devel.i686 : Heartbeat developmentpackage

heartbeat-devel.x86_64 : Heartbeat developmentpackage

heartbeat-libs.i686 : Heartbeat libraries

heartbeat-libs.x86_64 : Heartbeat libraries

heartbeat.x86_64 : Messaging and membershipsubsystem for High-Availability Linux

[root@test-master ~]# yum-y install heartbeat

[root@test-master ~]# chkconfig heartbeat off

[root@test-master ~]# chkconfig --list heartbeat

heartbeat???????? 0:off 1:off 2:off 3:off 4:off 5:off 6:off

test-backup：

[root@test-backup ~]# uname -n

test-backup

[root@test-backup ~]# ifconfig | grep eth0 -A 1

eth0?????Link encap:Ethernet? HWaddr00:0C:29:15:E6:BB?

?????????inet addr:10.96.20.114?Bcast:10.96.20.255?Mask:255.255.255.0

[root@test-backup ~]# ifconfig | grep eth1 -A 1

eth1?????Link encap:Ethernet? HWaddr00:0C:29:15:E6:C5?

?????????inet addr:172.16.1.114?Bcast:172.16.1.255?Mask:255.255.255.0

[root@test-backup ~]# routeadd -host 172.16.1.113 dev eth1

[root@test-backup ~]# ssh-keygen-t rsa -f ./.ssh/id_rsa -P ''

Generating public/private rsa key pair.

Your identification has been saved in./.ssh/id_rsa.

Your public key has been saved in ./.ssh/id_rsa.pub.

The key fingerprint is:

08:ea:6a:44:7f:1a:c9:bf:ff:01:d5:32:e5:39:1b:b8root@test-backup

The key's randomart p_w_picpath is:

+--[ RSA 2048]----+

|??????????.???? |

|????????? =.??? |

|??? .??? = *????|

| . . . .. + +???|

|. + . ..SE .????|

| o = . ?.???????|

|. . =???.?????? |

| o . .???.????? |

|o???.o...?????? |

+-----------------+

[root@test-backup ~]#ssh-copy-id -i ./.ssh/id_rsa root@test-master

The authenticity of host 'test-master(10.96.20.113)' can't be established.

RSA key fingerprint is63:f5:2e:dc:96:64:54:72:8e:14:7e:ec:ef:b8:a1:0c.

Are you sure you want to continue connecting(yes/no)? yes

Warning: Permanently added 'test-master' (RSA) tothe list of known hosts.

root@test-master's password:

Now try logging into the machine, with "ssh'root@test-master'", and check in:

?.ssh/authorized_keys

to make sure we haven't added extra keys that youweren't expecting.

[root@test-backup ~]# crontab -l

*/5 * * * * /usr/sbin/ntpdate time.windows.com&> /dev/null

[root@test-backup ~]# service crond restart

Stopping crond:???????????????????????????????????????????[? OK? ]

Starting crond:???????????????????????????????????????????[? OK? ]

[root@test-backup ~]# wgethttp://mirrors.ustc.edu.cn/fedora/epel/6/x86_64/epel-release-6-8.noarch.rpm

[root@test-backup ~]# rpm -ivh epel-release-6-8.noarch.rpm

[root@test-backup ~]# yum -y install heartbeat

[root@test-backup ~]# chkconfig heartbeat off

[root@test-backup ~]# chkconfig --list heartbeat

heartbeat???????? 0:off 1:off 2:off 3:off 4:off 5:off 6:off

test-master：

[root@test-master ~]# cp /usr/share/doc/heartbeat-3.0.4/{ha.cf,authkeys,haresources} /etc/ha.d/

[root@test-master ~]# cd /etc/ha.d

[root@test-master ha.d]# ls

authkeys?ha.cf? harc? haresources?rc.d? README.config? resource.d?shellfuncs

[root@test-master ha.d]# vim authkeys?? #（使用#ddif=/dev/random count=1 bs=512 | md5sum生成隨機數，sha1后跟隨機數）

auth 1

1 sha1912d6402295ac8d47109e56b177073b9

[root@test-master ha.d]# chmod 600 authkeys ??#（此文件權限600，否則啟動服務時會報錯）

[root@test-master ha.d]# ll !$

ll authkeys

-rw-------. 1 root root 692 Aug? 7 21:51 authkeys

[root@test-master ha.d]# vim ha.cf

debugfile /var/log/ha-debug?? #（調試日志）

logfile /var/log/ha-log

logfacility????local1?? #（在rsyslog服務中配置通過local1接收日志）

keepalive 2??#（指定心跳間隔時間，即2s發一次廣播）

deadtime 30??#（指定備node在30s內沒收到主node的心跳信息則立即接管對方的服務資源）

warntime 10??#（指定心跳延遲的時間為10s，當10s內備node沒收到主node的心跳信息，就會往日志中寫警告，此時不會切換服務）

initdead 120??#（指定在heartbeat首次運行后，需等待120s才啟動主node的各資源，此項用于解決等待對方heartbeat服務啟動了自己才啟，此項值至少要是deadtime的兩倍）

udpport 694

#bcast? eth0?? #（指定心跳使用以太網廣播方式在eth0上廣播，若要使用兩個實際網絡傳送心跳則要為bcast eth0 eth1）

mcast eth0 225.0.0.11 694 1 0?? #（設置多播通信的參數，多播地址在LAN內必須是唯一的，因為有可能有多個heartbeat服務，多播地址使用D類IP（224.0.0.0--239.255.255.255），格式為mcastdev mcast_group port ttl loop）

auto_failback on?? #（用于主node恢復后failback）

node test-master ??#（主node主機名，uname -n結果）

node test-backup?? #（備node主機名）

crm no?? #（是否開啟CRM功能）

[root@test-master ha.d]# vim haresources

test-master???? IPaddr::10.96.20.8/24/eth0?? #（此句相當于執行#/etc/ha.d/resource.d/IPaddr10.96.20.8/24/eth0 stop|start，IPaddr即是/etc/ha.d/resource.d/下的腳本）

[root@test-master ha.d]#scp authkeys ha.cf haresources root@test-backup:/etc/ha.d/

authkeys???????????????????????????????????????????????????????????????????????????????????????????100%? 692???? 0.7KB/s??00:00???

ha.cf??????????????????????????????????????????????????????????????????????????????????????????????100%?? 10KB?10.3KB/s?? 00:00???

haresources????????????????????????????????????????????????????????????????????????????????????????100% 5944???? 5.8KB/s?? 00:00???

[root@test-master ha.d]# service heartbeat start

Starting High-Availability services: INFO:? Resource is stopped

Done.

[root@test-master ha.d]# ssh test-backup 'service heartbeat start'

Starting High-Availability services:2016/08/07_22:39:00 INFO:? Resource isstopped

Done.

[root@test-master ha.d]# ps aux | grep heartbeat

root?????63089? 0.0? 3.1?50124? 7164 ???????? SLs?22:38?? 0:00 heartbeat: mastercontrol process

root?????63093? 0.0? 3.1?50076? 7116 ???????? SL??22:38?? 0:00 heartbeat: FIFOreader???????

root?????63094? 0.0? 3.1?50072? 7112 ???????? SL??22:38?? 0:00 heartbeat: write:mcast eth0?

root?????63095? 0.0? 3.1?50072? 7112 ???????? SL??22:38?? 0:00 heartbeat: read:mcast eth0??

root?????63136? 0.0? 0.3 103264??836 pts/0??? S+?? 22:39??0:00 grep heartbeat

[root@test-master ha.d]# ssh test-backup 'ps aux |grep heartbeat'

root??????3050? 0.0? 3.1?50124? 7164 ???????? SLs?22:39?? 0:00 heartbeat: mastercontrol process

root??????3054? 0.0? 3.1?50076? 7116 ???????? SL??22:39?? 0:00 heartbeat: FIFOreader???????

root??????3055? 0.0? 3.1?50072? 7112 ???????? SL??22:39?? 0:00 heartbeat: write:mcast eth0?

root??????3056? 0.0? 3.1?50072? 7112 ???????? SL??22:39?? 0:00 heartbeat: read:mcast eth0??

root??????3094? 0.0? 0.5 106104?1368 ???????? Ss?? 22:39??0:00 bash -c ps aux | grep heartbeat

root??????3108? 0.0? 0.3 103264??832 ???????? S??? 22:39??0:00 grep heartbeat

[root@test-master ha.d]# netstat -tnulp | grep heartbeat

udp???????0????? 0 225.0.0.11:694????????????? 0.0.0.0:*?????????????????????????????? 63094/heartbeat:wr

udp???????0????? 0 0.0.0.0:50268?????????????? 0.0.0.0:*?????????????????????????????? 63094/heartbeat:wr

[root@test-master ha.d]# ssh test-backup 'netstat-tnulp | grep heartbeat'

udp???????0????? 0 0.0.0.0:58019?????????????? 0.0.0.0:*?????????????????????????????? 3055/heartbeat:wri

udp???????0????? 0 225.0.0.11:694????????????? 0.0.0.0:*??????????? ???????????????????3055/heartbeat: wri

[root@test-master ha.d]# ip addr | grep 10.96.20

??? inet 10.96.20.113/24 brd 10.96.20.255 scope global eth0

??? inet 10.96.20.8/24 brd 10.96.20.255 scope global secondaryeth0

[root@test-master ha.d]# ssh test-backup 'ip addr |grep 10.96.20'

??? inet10.96.20.114/24 brd 10.96.20.255 scope global eth0

[root@test-master ha.d]# service heartbeat stop

Stopping High-Availability services: Done.

[root@test-master ha.d]# ip addr | grep 10.96.20

??? inet10.96.20.113/24 brd 10.96.20.255 scope global eth0

[root@test-master ha.d]# ssh test-backup 'ip addr |grep 10.96.20'

??? inet 10.96.20.114/24 brd 10.96.20.255 scope global eth0

??? inet 10.96.20.8/24 brd 10.96.20.255 scope global secondaryeth0

[root@test-master ha.d]# service heartbeat start

Starting High-Availability services: INFO:? Resource is stopped

Done.

[root@test-master ha.d]# ip addr | grep 10.96.20

??? inet10.96.20.113/24 brd 10.96.20.255 scope global eth0

??? inet10.96.20.8/24 brd 10.96.20.255 scope global secondary eth0

[root@test-master ha.d]# ssh test-backup 'ip addr |grep 10.96.20'

??? inet10.96.20.114/24 brd 10.96.20.255 scope global eth0

[root@test-master ~]# service heartbeat stop

Stopping High-Availability services: Done.

[root@test-master ~]# ssh test-backup 'serviceheartbeat stop'

Stopping High-Availability services: Done.

2、安裝配置drbd

test-master：

[root@test-master ~]# fdisk -l

……

Disk /dev/sdb: 2147 MB, 2147483648 bytes

255 heads, 63 sectors/track, 261 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Sector size (logical/physical): 512 bytes / 512bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x00000000

[root@test-master ~]# parted /dev/sdb? #（parted命令可支持大于2T的硬盤，將新硬盤分兩個區，一個區用于放數據，另一個區用于drbd的meta data）

GNU Parted 2.1

Using /dev/sdb

Welcome to GNU Parted! Type 'help' to view a listof commands.

(parted) h???????????????????????????????????????????????????????????????

? align-checkTYPE N??????????????????????? checkpartition N for TYPE(min|opt) alignment

? checkNUMBER???????????????????????????? do asimple check on the file system

? cp[FROM-DEVICE] FROM-NUMBER TO-NUMBER??copy file system to another partition

? help[COMMAND]?????????????????????????? printgeneral help, or help on COMMAND

? mklabel,mktable LABEL-TYPE ??????????????create a new disklabel (partitiontable)

? mkfs NUMBERFS-TYPE????????????????????? make aFS-TYPE file system on partition NUMBER

? mkpart PART-TYPE [FS-TYPE] START END???? make a partition

? mkpartfsPART-TYPE FS-TYPE START END???? make apartition with a file system

? move NUMBERSTART END??????????????????? movepartition NUMBER

? name NUMBERNAME???????????????????????? namepartition NUMBER as NAME

? print [devices|free|list,all|NUMBER] ????display the partition table, availabledevices, free space, all found partitions, or a

???????particular partition

? quit???????????????????????????????????? exitprogram

? rescueSTART END???????????????????????? rescuea lost partition near START and END

? resizeNUMBER START END????????????????? resizepartition NUMBER and its file system

? rmNUMBER???????????????????????????????delete partition NUMBER

? selectDEVICE??????????????????????????? choosethe device to edit

? set NUMBERFLAG STATE??????????????????? change theFLAG on partition NUMBER

? toggle[NUMBER [FLAG]]?????????????????? togglethe state of FLAG on partition NUMBER

? unitUNIT??????????????????????????????? setthe default unit to UNIT

?version?????????????????????????? ???????display the version number and copyrightinformation of GNU Parted

(parted) mklabel gpt?????????????????????????????????????????????????????

(parted) mkpart primary 0 1024

Warning: The resulting partition is not properlyaligned for best performance.

Ignore/Cancel? Ignore

(parted) mkpart primary 1025 2147????????????????????????????????????????

Warning: The resulting partition is not properlyaligned for best performance.

Ignore/Cancel? Ignore

(parted) p?????????????????????????????????? ?????????????????????????????

Model: VMware, VMware Virtual S (scsi)

Disk /dev/sdb: 2147MB

Sector size (logical/physical): 512B/512B

Partition Table: gpt

Number?Start?? End???? Size???File system? Name???? Flags

?1????? 17.4kB?1024MB? 1024MB?????????? ????primary

?2????? 1025MB?2147MB? 1122MB?????????????? primary

[root@test-master ~]# wget http://www.elrepo.org/elrepo-release-6-6.el6.elrepo.noarch.rpm

[root@test-master ~]# rpm -ivh elrepo-release-6-6.el6.elrepo.noarch.rpm

warning: elrepo-release-6-6.el6.elrepo.noarch.rpm:Header V4 DSA/SHA1 Signature, key ID baadae52: NOKEY

Preparing...???????????????########################################### [100%]

??1:elrepo-release????????########################################### [100%]

[root@test-master ~]# yum -y install drbd kmod-drbd84

[root@test-master ~]# modprobe drbd

FATAL: Module drbd not found.

[root@test-master ~]# yum -y install kernel*?? #（更新內核后要重啟系統）

[root@test-master ~]# uname -r

2.6.32-642.3.1.el6.x86_64

[root@test-master ~]# depmod

[root@test-master ~]# lsmod| grep drbd

drbd????????????????? 372759? 0

libcrc32c?????????????? 1246? 1 drbd

[root@test-master ~]# ll /usr/src/kernels/

total 12

drwxr-xr-x. 22 root root 4096 Mar 31 06:462.6.32-431.el6.x86_64

drwxr-xr-x. 22 root root 4096 Aug? 8 03:40 2.6.32-642.3.1.el6.x86_64

drwxr-xr-x. 22 root root 4096 Aug? 8 03:40 2.6.32-642.3.1.el6.x86_64.debug

[root@test-master ~]# echo "modprobe drbd >/dev/null 2>&1" > /etc/sysconfig/modules/drbd.modules

[root@test-master ~]# cat !$

cat /etc/sysconfig/modules/drbd.modules

modprobe drbd > /dev/null 2>&1

test-backup：

[root@test-backup ~]# parted /dev/sdb

(parted) mklabel gpt

(parted) mkpart primary 0 4096???????????????????????????????????????????

Warning: The resulting partition is not properlyaligned for best performance.

Ignore/Cancel? Ignore????????????????????????????????????????????????????

(parted) mkpart primary 4097 5368????????????????????????????????????????

(parted) p???????????????????????????????????????????????????????????????

Model: VMware, VMware Virtual S (scsi)

Disk /dev/sdb: 5369MB

Sector size (logical/physical): 512B/512B

Partition Table: gpt

Number?Start?? End???? Size???File system? Name???? Flags

?1????? 17.4kB?4096MB? 4096MB?????????????? primary

?2????? 4097MB?5368MB? 1271MB?????????????? primary

[root@test-backup ~]# wget http://www.elrepo.org/elrepo-release-6-6.el6.elrepo.noarch.rpm

[root@test-backup ~]# rpm -ivh elrepo-release-6-6.el6.elrepo.noarch.rpm

[root@test-backup ~]# ll /etc/yum.repos.d/

total 20

-rw-r--r--. 1 root root 1856 Jul 19 00:28CentOS6-Base-163.repo

-rw-r--r--. 1 root root 2150 Feb? 9? 2014elrepo.repo

-rw-r--r--. 1 root root? 957 Nov?4? 2012 epel.repo

-rw-r--r--. 1 root root 1056 Nov? 4? 2012epel-testing.repo

-rw-r--r--. 1 root root? 529 Mar 30 23:00 rhel-source.repo.bak

[root@test-backup ~]# yum -y install drbd kmod-drbd84

[root@test-backup ~]# yum -y install kernel*

[root@test-backup ~]# depmod

[root@test-backup ~]# lsmod | grep drbd

drbd????????????????? 372759? 0

libcrc32c?????????????? 1246? 1 drbd

[root@test-backup ~]# chkconfig drbd off

[root@test-backup ~]# chkconfig --list drbd

drbd???????????? 0:off 1:off 2:off 3:off 4:off 5:off 6:off

[root@test-backup ~]# echo "modprobe drbd >/dev/null 2>&1" > /etc/sysconfig/modules/drbd.modules

[root@test-backup ~]# cat !$

cat /etc/sysconfig/modules/drbd.modules

modprobe drbd > /dev/null 2>&1

test-master：

[root@test-master ~]# vim /etc/drbd.d/global_common.conf

[root@test-master ~]# egrep -v "#|^$" /etc/drbd.d/global_common.conf

global {

???????? usage-countno;

}

common {

???????? handlers{

???????? }

???????? startup{

???????? }

???????? options{

???????? }

???????? disk{

???????????????on-io-error detach;

???????? }

???????? net {

???????? }

???????? syncer{

?????????????????? rate50M;

?????????????????? verify-algcrc32c;

???????? }

}

[root@test-master ~]# vim /etc/drbd.d/data.res

resource data {

???????protocol C;

??????? ontest-master {

???????????????device? /dev/drbd0;

???????????????disk??? /dev/sdb1;

???????????????address 172.16.1.113:7788;

???????????????meta-disk?????? /dev/sdb2[0];

??????? }

??????? ontest-backup {

???????????????device? /dev/drbd0;

???????????????disk??? /dev/sdb1;

???????????????address 172.16.1.114:7788;

???????????????meta-disk?????? /dev/sdb2[0];

??????? }

}

[root@test-master ~]# cd /etc/drbd.d

[root@test-master drbd.d]# scp global_common.conf data.res root@test-backup:/etc/drbd.d/

global_common.conf?????????????????????????????????????????????????????????????????????????????????????100% 2144???? 2.1KB/s?? 00:00???

data.res???????????????????????????????????????????????????????????????????????????????????????????????100%? 251???? 0.3KB/s??00:00???

[root@test-master drbd.d]# drbdadm --help

USAGE: drbdadm COMMAND [OPTION...]{all|RESOURCE...}

GENERAL OPTIONS:

? --stacked,-S

? --dry-run,-d

? --verbose,-v

?--config-file=..., -c ...

?--config-to-test=..., -t ...

? --drbdsetup=...,-s ...

?--drbdmeta=..., -m ...

?--drbd-proxy-ctl=..., -p ...

?--sh-varname=..., -n ...

? --peer=...,-P ...

? --version,-V

?--setup-option=..., -W ...

? --help, -h

COMMANDS:

?attach???????????????????????????? disk-options?????????????????? ????

?detach???????????????????????????? connect???????????????????????????

?net-options??????????????????????? disconnect????????????????????????

?up ????????????????????????????????resource-options??????????????????

?down ??????????????????????????????primary???????????????????????????

?secondary????????????????????????? invalidate????????????????????????

?invalidate-remote????????????????? outdate???????????????????????????

?resize???????????????????????????? verify??? ?????????????????????????

?pause-sync???????????????????????? resume-sync???????????????????????

?adjust????????????????????????????adjust-with-progress??????????????

?wait-connect?????????????????????? wait-con-int??????????????????????

?role???????? ??????????????????????cstate????????????????????????????

?dstate???????????????????????????? dump??????????????????????????????

?dump-xml?????????????????????????? create-md ?????????????????????????

?show-gi??????????????????????????? get-gi??????????? ?????????????????

?dump-md??????????????????????????? wipe-md???????????????????????????

?apply-al?????????????????????????? hidden-commands????

[root@test-master drbd.d]# drbdadm create-md data

initializing activity log

NOT initializing bitmap

Writing meta data...

New drbd meta data block successfully created.

[root@test-master drbd.d]# ssh test-backup 'drbdadm create-md data'

NOT initializing bitmap

initializing activity log

Writing meta data...

New drbd meta data block successfully created.

[root@test-master drbd.d]#drbdadm up data

[root@test-master drbd.d]# ssh test-backup 'drbdadm up data'

[root@test-master drbd.d]# cat /proc/drbd

version: 8.4.7-1 (api:1/proto:86-101)

GIT-hash: 3a6a769340ef93b1ba2792c6461250790795db49build by mockbuild@Build64R6, 2016-01-12 13:27:11

?0:cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----

??? ns:0 nr:0dw:0 dr:0 al:8 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:999984

[root@test-master drbd.d]# ssh test-backup 'cat /proc/drbd'

version: 8.4.7-1 (api:1/proto:86-101)

GIT-hash: 3a6a769340ef93b1ba2792c6461250790795db49build by mockbuild@Build64R6, 2016-01-12 13:27:11

?0:cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----

??? ns:0 nr:0dw:0 dr:0 al:16 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:999984

[root@test-master drbd.d]# drbdadm -- --overwrite-data-of-peer primary data??#（僅在主上執行）

[root@test-master drbd.d]# cat /proc/drbd

version: 8.4.7-1 (api:1/proto:86-101)

GIT-hash: 3a6a769340ef93b1ba2792c6461250790795db49build by mockbuild@Build64R6, 2016-01-12 13:27:11

?0:cs:SyncSource ro:Primary/Secondaryds:UpToDate/Inconsistent C r-----

??? ns:339968nr:0 dw:0 dr:340647 al:8 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:660016

???????? [=====>..............]sync'ed: 34.3% (660016/999984)K

???????? finish:0:00:15 speed: 42,496 (42,496) K/sec

[root@test-master drbd.d]# cat /proc/drbd

version: 8.4.7-1 (api:1/proto:86-101)

GIT-hash: 3a6a769340ef93b1ba2792c6461250790795db49build by mockbuild@Build64R6, 2016-01-12 13:27:11

?0:cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----

??? ns:630784nr:0 dw:0 dr:631463 al:8 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:369200

???????? [===========>........]sync'ed: 63.3% (369200/999984)K

???????? finish:0:00:09 speed: 39,424 (39,424) K/sec

[root@test-master drbd.d]# cat /proc/drbd

version: 8.4.7-1 (api:1/proto:86-101)

GIT-hash: 3a6a769340ef93b1ba2792c6461250790795db49build by mockbuild@Build64R6, 2016-01-12 13:27:11

?0:cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----

??? ns:942080nr:0 dw:0 dr:942759 al:8 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:57904

???????? [=================>..]sync'ed: 94.3% (57904/999984)K

???????? finish:0:00:01 speed: 39,196 (39,252) K/sec

[root@test-master drbd.d]# cat /proc/drbd

version: 8.4.7-1 (api:1/proto:86-101)

GIT-hash: 3a6a769340ef93b1ba2792c6461250790795db49build by mockbuild@Build64R6, 2016-01-12 13:27:11

?0:cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----

??? ns:999983nr:0 dw:0 dr:1000662 al:8 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

[root@test-master drbd.d]# ssh test-backup 'cat /proc/drbd'

version: 8.4.7-1 (api:1/proto:86-101)

GIT-hash: 3a6a769340ef93b1ba2792c6461250790795db49build by mockbuild@Build64R6, 2016-01-12 13:27:11

?0:cs:Connected ro:Secondary/Primaryds:UpToDate/UpToDate C r-----

??? ns:0nr:999983 dw:999983 dr:0 al:16 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

[root@test-master drbd.d]# mkdir/drbd

[root@test-master drbd.d]# ssh test-backup 'mkdir /drbd'

[root@test-master drbd.d]#?mkfs.ext4 -b 4096 /dev/drbd0?? #（僅在主上執行，meta分區不要格式化）

Writing superblocks and filesystem accountinginformation: done

[root@test-master drbd.d]# tune2fs -c -1 /dev/drbd0

tune2fs 1.41.12 (17-May-2010)

Setting maximal mount count to -1

[root@test-master drbd.d]# mount /dev/drbd0 /drbd

[root@test-master drbd.d]# cd /drbd

[root@test-master drbd]#?for i in `seq 1 10`; do touch test$i; done

[root@test-master drbd]# ls

lost+found?test1? test10? test2?test3? test4? test5?test6? test7? test8?test9

[root@test-master drbd]# cd

[root@test-master ~]# umount /dev/drbd0

[root@test-master ~]# drbdadm secondary data

[root@test-master ~]# cat /proc/drbd

version: 8.4.7-1 (api:1/proto:86-101)

GIT-hash: 3a6a769340ef93b1ba2792c6461250790795db49build by mockbuild@Build64R6, 2016-01-12 13:27:11

?0:cs:Connected ro:Secondary/Secondaryds:UpToDate/UpToDate C r-----

???ns:1032538 nr:0 dw:32554 dr:1001751 al:19 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1wo:f oos:0

test-backup：

[root@test-backup ~]# cat /proc/drbd

version: 8.4.7-1 (api:1/proto:86-101)

GIT-hash: 3a6a769340ef93b1ba2792c6461250790795db49build by mockbuild@Build64R6, 2016-01-12 13:27:11

?0:cs:Connected ro:Secondary/Secondaryds:UpToDate/UpToDate C r-----

??? ns:0nr:1032538 dw:1032538 dr:0 al:16 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

[root@test-backup ~]#?drbdadm primary data

[root@test-backup ~]# cat /proc/drbd

version: 8.4.7-1 (api:1/proto:86-101)

GIT-hash: 3a6a769340ef93b1ba2792c6461250790795db49build by mockbuild@Build64R6, 2016-01-12 13:27:11

?0:cs:Connected ro:Primary/Secondaryds:UpToDate/UpToDate C r-----

??? ns:0nr:1032538 dw:1032538 dr:679 al:16 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

[root@test-backup ~]# mount /dev/drbd0 /drbd

[root@test-backup ~]# ls /drbd

lost+found?test1? test10? test2?test3? test4? test5?test6? test7? test8?test9

3、調試heartbeat+drbd

[root@test-master ~]# ssh test-backup 'umount/drbd'

[root@test-master ~]# ssh test-backup 'drbdadmsecondary data'

[root@test-master ~]# service drbd stop

Stopping all DRBD resources: .

[root@test-master ~]# ssh test-backup 'service drbdstop'

Stopping all DRBD resources: .

[root@test-master ~]# service heartbeat status

heartbeat is stopped. No process

[root@test-master ~]# ssh test-backup 'serviceheartbeat status'

heartbeat is stopped. No process

[root@test-master ~]# ll/etc/ha.d/resource.d/{Filesystem,drbddisk}

-rwxr-xr-x. 1 root root 3162 Jan 12? 2016 /etc/ha.d/resource.d/drbddisk

-rwxr-xr-x. 1 root root 1903 Dec? 2? 2013/etc/ha.d/resource.d/Filesystem

[root@test-master ~]# vim /etc/ha.d/haresources ??#（此行內容相當于腳本加參數的執行方式，例如#/etc/ha.d/resource.d/IPaddr 10.96.20.8/24/eth0 start|stop，#/etc/ha.d/resource.d/drbddiskdata start|stop，#/etc/ha.d/resource.d/Filesystem /dev/drbd0 /drbd ext4 start|stop；heartbeat就是這樣按配置的先后順序控制資源的，如果heartbeat出問題了，可通過查看日志并單獨運行這些命令排錯）

test-master???? IPaddr::10.96.20.8/24/eth0????? drbddisk::data? Filesystem::/dev/drbd/0::/drbd::ext4

[root@test-master ~]# scp /etc/ha.d/haresourcesroot@test-backup:/etc/ha.d/

haresources???????????????????????????????????????????????????????????????????????????????????????????????100% 5996???? 5.9KB/s?? 00:00?

[root@test-master~]# service drbd start?? #（在主node執行）

Starting DRBD resources: [

???? createres: data

?? preparedisk: data

??? adjustdisk: data

???? adjustnet: data

]

..........

***************************************************************

?DRBD's startupscript waits for the peer node(s) to appear.

?- If thisnode was already a degraded cluster before the

?? reboot,the timeout is 0 seconds. [degr-wfc-timeout]

?- If thepeer was available before the reboot, the timeout

?? is 0seconds. [wfc-timeout]

?? (Thesevalues are for resource 'data'; 0 sec -> wait forever)

?To abortwaiting enter 'yes' [? 23]:

[root@test-backup~]# service drbd start?? #（在備node執行）

Starting DRBD resources: [

???? createres: data

?? preparedisk: data

??? adjustdisk: data

???? adjustnet: data

]

[root@test-master ~]# drbdadm role data

Secondary/Secondary

[root@test-master ~]# ssh test-backup 'drbdadm roledata'

Secondary/Secondary

[root@test-master ~]# drbdadm -- --overwrite-data-of-peer primary data

[root@test-master ~]# drbdadm role data

Primary/Secondary

[root@test-master ~]# service heartbeat start

Starting High-Availability services: INFO:? Resource is stopped

Done.

[root@test-master ~]# ssh test-backup 'serviceheartbeat start'

Starting High-Availability services: 2016/08/09_03:08:11INFO:? Resource is stopped

Done.

[root@test-master ~]# ip addr | grep 10.96.20

??? inet10.96.20.113/24 brd 10.96.20.255 scope global eth0

??? inet 10.96.20.8/24 brd 10.96.20.255 scope global secondaryeth0

[root@test-master ~]# drbdadm role data

Primary/Secondary

[root@test-master ~]# df -h

Filesystem?????Size? Used Avail Use% Mounted on

/dev/sda2???????18G? 6.3G?? 11G?38% /

tmpfs??????????112M???? 0? 112M??0% /dev/shm

/dev/sda1??????283M?? 83M? 185M?31% /boot

/dev/sr0???????3.6G ?3.6G???? 0 100% /mnt/cdrom

/dev/drbd0????? 946M?1.3M? 896M?? 1% /drbd

[root@test-master ~]# ls /drbd

lost+found?test1? test10? test2?test3? test4? test5?test6? test7? test8?test9

[root@test-master ~]# service heartbeat stop

Stopping High-Availability services: Done.

[root@test-master ~]# ssh test-backup 'ip addr |grep 10.96.20'

??? inet10.96.20.114/24 brd 10.96.20.255 scope global eth0

??? inet10.96.20.8/24 brd 10.96.20.255 scope global secondary eth0

[root@test-master ~]# ssh test-backup 'df -h'

Filesystem?????Size? Used Avail Use% Mounted on

/dev/sda2???????18G? 3.9G?? 13G?24% /

tmpfs??????????112M???? 0? 112M??0% /dev/shm

/dev/sda1??????283M?? 83M? 185M?31% /boot

/dev/sr0???????3.6G? 3.6G???? 0 100% /mnt/cdrom

/dev/drbd0?????946M? 1.3M ?896M??1% /drbd

[root@test-master ~]# ssh test-backup 'ls /drbd'

lost+found

test1

test10

test2

test3

test4

test5

test6

test7

test8

test9

[root@test-master ~]# drbdadm role data??

Secondary/Primary

[root@test-master ~]# service heartbeat start?? #（主node恢復后，先確保把drbd理順，弄正常，再開啟heartbeat服務）

Starting High-Availability services: INFO:? Resource is stopped

Done.

[root@test-master ~]# drbdadm role data

Primary/Secondary

[root@test-master ~]# ip addr | grep 10.96.20

??? inet10.96.20.113/24 brd 10.96.20.255 scope global eth0

??? inet10.96.20.8/24 brd 10.96.20.255 scope global secondary eth0

[root@test-master ~]# df -h

Filesystem?????Size? Used Avail Use% Mounted on

/dev/sda2???????18G? 6.3G?? 11G?38% /

tmpfs??????????112M???? 0? 112M??0% /dev/shm

/dev/sda1? ?????283M??83M? 185M? 31% /boot

/dev/sr0???????3.6G? 3.6G???? 0 100% /mnt/cdrom

/dev/drbd0?????946M? 1.3M? 896M??1% /drbd

[root@test-master ~]# ls /drbd

lost+found?test1? test10? test2?test3? test4? test5?test6? test7? test8?test9

注：若兩端出現Primary/Unknown或Secondary/Unknown，調整方法：

#service heartbeat stop?? #（把兩端heartbeat服務停掉）

#drbdadm secondary data?? #（將備node的drbd置從）

#drbdadm disconnect data

#drbdadm -- --discard-my-data connect data

#drbdadm role data

#drbdadm connect data?? #（主node操作）

4、安裝配置nfs

在兩個主node和nfs slave1上均如下操作：

[root@test-master ~]# yum -y groupinstall 'NFS fileserver'

[root@test-master ~]# rpm -qa nfs-utils rpcbind

nfs-utils-1.2.3-70.el6_8.1.x86_64

rpcbind-0.2.0-12.el6.x86_64

[root@test-master ~]# service rpcbind start

[root@test-master ~]# service nfs start

Starting NFS services:???????????????????????????????????? [? OK? ]

Starting NFS quotas:?????????????????????????????????????? [? OK? ]

Starting NFS mountd:?????????????????????????????????????? [? OK? ]

Starting NFS daemon:?????? ????????????????????????????????[? OK? ]

Starting RPC idmapd:?????????????????????????????????????? [? OK? ]

[root@test-master ~]# chkconfig rpcbind on

[root@test-master ~]# chkconfig nfs on

[root@test-master ~]# chkconfig --list rpcbind

rpcbind??????????? 0:off 1:off 2:on 3:on 4:on 5:on 6:off

[root@test-master ~]# chkconfig --list nfs

nfs?????????????? 0:off 1:off 2:on 3:on 4:on 5:on 6:off

在兩個主node上操作：

[root@test-master ~]# vim /etc/exports

/drbd??10.96.20.*(rw,sync,all_squash,anonuid=65534,anongid=65534,mp,fsid=2)

[root@test-master ~]# chmod 777 -R /drbd

[root@test-master ~]# service nfs reload?? #（相當于#exportfs-r）

5、測試：

兩端主均開啟heartbeat

在nfs-slave上測試，正常

[root@test-master ~]# service heartbeat stop

Stopping High-Availability services:

/sbin/service: line 66: 17235 Killed????????????????? env -i PATH="$PATH"TERM="$TERM" "${SERVICEDIR}/${SERVICE}" ${OPTIONS}

[root@test-master ~]# tail -f /var/log/ha-log?? #（測試在對heartbeat停服時，切換過程中一直卸載不掉掛載的分區，最終會強制重啟server）

Filesystem(Filesystem_/dev/drbd0)[19791]:? 2016/08/09_04:36:21 INFO: No processes on/drbd were signalled. force_unmount is

Filesystem(Filesystem_/dev/drbd0)[19791]:? 2016/08/09_04:36:22 ERROR: Couldn't unmount /drbd; trying cleanup with KILL

Filesystem(Filesystem_/dev/drbd0)[19791]:? 2016/08/09_04:36:22 INFO: No processes on/drbd were signalled. force_unmount is

Filesystem(Filesystem_/dev/drbd0)[19791]:? 2016/08/09_04:36:23 ERROR: Couldn't unmount/drbd, giving up!

/usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[19783]: 2016/08/09_04:36:23 ERROR:? Generic error

ResourceManager(default)[17256]:??????? 2016/08/09_04:36:23 ERROR: Return code 1from /etc/ha.d/resource.d/Filesystem

/usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[20014]: 2016/08/09_04:36:23 INFO:? Running OK

ResourceManager(default)[17256]:??????? 2016/08/09_04:36:23 CRIT: Resource STOP failure. Reboot required!

ResourceManager(default)[17256]:??????? 2016/08/09_04:36:23 CRIT: Killingheartbeat ungracefully!

[root@test-backup ~]# drbdadm role data?? #（主node那邊server重啟后，備node查看已接管）

Primary/Unknown

[root@test-backup ~]# ip addr

……

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP>mtu 1500 qdisc pfifo_fast state UP qlen 1000

???link/ether 00:0c:29:15:e6:bb brd ff:ff:ff:ff:ff:ff

??? inet10.96.20.114/24 brd 10.96.20.255 scope global eth0

??? inet 10.96.20.8/24 brd 10.96.20.255 scopeglobal secondary eth0

??? inet6fe80::20c:29ff:fe15:e6bb/64 scope link

??????valid_lft forever preferred_lft forever

[root@test-backup ~]# df -h

Filesystem?????Size? Used Avail Use% Mounted on

/dev/sda2???????18G? 3.9G??13G? 24% /

tmpfs??????????112M???? 0? 112M??0% /dev/shm

/dev/sda1??????283M?? 83M? 185M?31% /boot

/dev/sr0???????3.6G? 3.6G???? 0 100% /mnt/cdrom

/dev/drbd0?????946M? 1.3M? 896M??1% /drbd

[root@test-backup ~]# ls /drbd

lost+found?test111? test2? test222.txt?test3? test4? test5?test6? test7? test8?test9

兩主node的熱備是實現了，但nfs slave掛載時一直掛載不上，卡住了，服務端（nfs master active）保存有nfs客戶端掛載狀態，這時需重啟nfs服務端，于是在heartbeat的haresources配置文件中加入腳本，讓其切換時重啟nfs

關閉兩主node的drbd和heartbeat服務

[root@test-master ~]# vim /etc/ha.d/haresources

test-master????IPaddr::10.96.20.8/24/eth0?????drbddisk::data?Filesystem::/dev/drbd0::/drbd::ext4????killnfs

[root@test-master ~]# cd /etc/ha.d/resource.d/

[root@test-master resource.d]# vim killnfs

---------------script start-------------

#!/bin/bash

for i in {1..10};do

???????killall nfsd

done

service nfs start

exit 0

----------------script end--------------

[root@test-master resource.d]# chmod 755 killnfs

[root@test-master resource.d]# ll killnfs

-rwxr-xr-x. 1 root root 79 Aug? 9 21:02 killnfs

[root@test-master resource.d]# scp killnfs root@test-backup:/etc/ha.d/resource.d/

killnfs???????????????????????????????????????????????????????????????????????????????????????????????????100%?? 79???? 0.1KB/s??00:00???

[root@test-master resource.d]# cd ..

[root@test-master ha.d]# scp haresources root@test-backup:/etc/ha.d/

haresources???????????????????????????????????????????????????????????????????????????????????????????????100% 6003???? 5.9KB/s?? 00:00???

調整好drbd再開啟heartbeat，重新測試，nfs slave在主切換時正常，沒有掛載不上或卡住的問題

注：調試的一個大前提是，確保drbd是正常的，再開啟heartbeat這樣就不會有問題

注：ganji圖片架構演變

注：用戶上傳圖片到web server上后，web server把圖片POST到對應設置ID的圖片server上，圖片server上的php接收到POST來的圖片把圖片寫入到本地磁盤并返回對應的成功狀態碼，前端web server根據返回成功的狀態碼把圖片server對應的ID和對應的圖片path寫入到DB server；用戶訪問頁面時，根據請求從DB讀取圖片server ID和圖片的URL到對應圖片server上訪問