V 8 nfs+drbd+heartbeat

?

nfs+drbd+heartbeatnfs或分布式存儲mfs只要有單點都可用此方案解決

在企業實際生產場景中,nfs是中小企業最常用的存儲架構解決方案之一,該架構方案部署簡單、維護方便,只需通過配inotify+rsync簡單而高效的數據同步方式就可實現對nfs存儲系統的數據進行異機主從同步及類似MySQLrw splitting,且多個從(讀r)還可通過LVShaproxy實現LB,既分擔大并發讀數據的壓力又排除了從的單點故障;

wKiom1etI3CxlpRiAABfwAfitGo849.jpg

web server上的存儲:

方案一(讀寫在任意一臺web server上都行,通過inotify+rsync將每個web server上的數據同步至其它web server,例如web1-->web2-->web3-->web2-->web1);

方案二(在LB器上配置,在寫(上傳文件)時只能到web3上,rweb12}上,使用inotify+rsync在同步時web3-->web2web3-->web1);

方案三(使用共享存儲nfs,若只一個rw都在一個上就成了單點,再加一臺,一主一備,彼此間使用inotify+rsync同步數據,這兩個可rw都放到一個上,另一個僅用來備份數據,也可以讀在備上寫在主上;一般讀多寫少,可再加一臺作為備,一主多從,減輕r的壓力,同步數據時主-->1、主-->2;若某一個存儲故障,web{1,2,3}要重新掛載,備掛掉一個不影響,主若掛掉則不可寫;于是給主做高可用master-activemaster-inactive,這兩臺主同一時間僅一臺對外提供服務,master-inactive為不活動狀態僅在切換后才對外提供服務);

方案四(棄用nfs共享存儲,使用master-activemaster-inactive,只將數據寫到共享存儲上,再返回來將共享存儲的數據同步到web{1,2,3}的本地,讀時直接從本地拿;

?

一主多從模型中,若要實現當主掛掉時仍可寫,且可繼續同步到從,用nfs+drbd+heartbeat實現主的高可用,解決主單點問題,當master-activenfs故障切至master-inactive nfs上,這兩主的數據是一致的,master-inactivenfs會自動與其它所有從nfs進行同步,從而實現nfs存儲系統熱備

?

master-active故障切至備master-inactive上時,備node要仍能向nfs slave同步數據,此時同步就不能全部同步而要僅同步切換后變更的數據,此處可用sersync代替inotify,通過sersync-r選項(或者也可以先不讓inotify啟動,待備nodeheartbeat啟動并掛載好之后,再開啟inotify服務)

?

注:此方案與MFSFastDFSGFSFS相比,部署簡單、維護控制方面較容易,符合簡單易高效的原則;但也有缺陷,每個node都是全部的數據(和MySQL同步一樣),大數據量文件同步很可能有數據延遲發生,可根據不同的數據目錄進行拆分同步(類似MySQL的庫表拆分方案),對于延遲也可開啟多個同步實例并通過程序實現控制讀寫邏輯,還要能監測到同步狀態

?

nfs高可用方案,解決在兩主node在切換時,nfsslave讀不到數據卡死狀態,可從以下幾方面入手:

rpcbind服務要一直確保開啟(主node、備nodenfs客戶端都要開啟);

nfs clientnfs slave)監控本地已掛載的nfs共享目錄,如果發現讀不了,執行重新掛載;

nfs client監控master-inactivenode是否有VIP出現或者drbd的狀態變為Primary,如果有執行重新掛載(nfs服務切換時通過SSH等機制nfs client實現remount;利用nagios監控,如果master inactive node出現VIP,執行一個指腳本進行多臺nfsclientremount);

?

?

wKioL1etI5CwysRGAAD_M6vskiU755.jpg

如圖:橢圓標注是此節操作的內容

?

wKiom1etI6Xz8LM-AAAQrPRXP48797.jpg

注:單臺server,無需文件存儲,數據放本地,只有做集群的情況下才需要做專門的存儲

?

wKioL1etI7SRSNHuAAAiBaqbYaw403.jpg

注:問題(單點,rw都在一個上性能不好,企業中做運維要考慮的(數據保護;7*24小時持續服務)

?

wKioL1etI8XQ7E4WAABXiiC16gA004.jpg

注:

web1web2一般用LNMP

IMG1IMG2一般nginxlighttpd

該方案既解決nfs master單點,又解決了并發讀性能問題,但如果數據寫并發持續加大,會導致如下問題:

適用于200-300/s上傳的圖片,并發同步效率方面還可以,若高于300/s可能導致masterslave同步延遲,解決辦法:開啟多線程同步,優化監控事件、磁盤IO、網絡IO

IMG server很多的情況下,只有一臺mastermaster既負責寫,又負責給多臺同步數據,壓力會很大;

圖片問題非常大時,每個node都是全部完整的數據,若總容量3T以上,可能導致單臺server存儲空間不夠,解決辦法:(一、可利用MySQL拆庫的思路解決容量、寫性能、同步延遲的問題,例如初期規則img1--img55個目錄對應5個域名,掛載這5個目錄,每個imgNUM變為一組新的nfs主從高可用及rw spltting的集群,rw splitting可用POSTwebDAV的方式);(二、通過DNS擴展多主的架構,增加新的服務意味著單點);(三、利用MySQLOracleMongodbcassandra等數據庫的內部功能實現文件數據的同步,愛奇藝用mongodbGridFS做圖片存儲);

注:mongodbGridFS做圖片存儲(支持分布式,設計思路:圖片存儲唯一;只存原始圖;首次請求生成縮略圖并生成靜態文件;url固定,根據不同url產生縮略圖;參考Abusing Amazon p_w_picpaths

注:facebook圖片管理架構

?

wKiom1etI-_jftmJAABgSM_cwLc939.jpg

注:

nfsHA解決了單點,浪費了一臺server

nfs兩主之間是通過heartbeat+drbd,采用drbdC協議實時同步;

nfs(M)nfs(S)之間通過inotify+rsync異步同步,nfs(S)通過VIPnfs(M)進行同步,nfsslaveNUM用來讀,nfs master用來寫,這解決了并發讀性能問題;也可將nfs master只寫,再由nfs master推至appserver(棄用nfs方案);

物理磁盤做RAID10RAID0根據性能和冗余需求來選擇;server之間、serverswitch是用雙千兆網卡bounding綁定;應用server(包括不限于web)通過VIP訪問nfs(M),通過不同的VIP訪問LBnfs(S)存儲池;nfs(M)的數據在drbd的分區中;

在數據量不大的情況下,可將直接將數據從nfs(M)上直接同步至appserver本地,讀全都從appserver本地讀取,寫要到nfs(M)上;

inotify+rsync做從master--slave同步時,在并發寫大的情況下會導致數據延遲或不同步;

?

注:

在企業實際工作場景中,只有萬不得已才會去搞DB和文件存儲的問題,平時應多在網站架構上做調整,以讓用戶請求最小化的訪問DB及存儲系統,例如做文件緩存和數據緩存(高并發的核心原則:把所有的用戶訪問請求都盡量往前推),而不是上來就搞分布式存儲系統,對于中小企業用分布式存儲就是大炮打蚊子,2012facebook已經很大的時候還是用nfs存儲系統(分布式不是萬能的,會消耗大量的人力、物力,控制不好會帶來災難的后果)

?

wKiom1etJAahvc1vAACiqXaA2Z8594.jpg

注:

為緩解網站訪問的壓力,盡量將user訪問的內容往前推,有放到user本地的就不要放到CDN,能放到CDN的就不要放到本地server,充分利用每一層的緩存,直到萬不得已才讓用戶訪問后端的DB,在此基礎上若撐不住,解決辦法:使用ssd+sata,還不行使用分布式存儲

?

?

1、安裝配置heartbeat

準備環境:

VIP10.96.20.8

mastereth010.96.20.113)、eth1172.16.1.113,不配網關及dns)、主機名(test-master

backupeth010.96.20.114)、eth1172.16.1.114,不配網關及dns)、主機名(test-backup

雙網卡、雙硬盤、

注:eth0為管理IPeth1心跳連接及drbd傳輸通道,若是生產環境中心跳傳輸和數據傳輸用一個網卡要做限制,給心跳留有帶寬

注:規范vmware中標簽,Xshell中標簽,公司中的生產環境所有主機均應在/etc/hosts文件中有相應記錄,方便分發及管理維護

?

test-master(分別配置主機名/etc/sysconfig/network結果一定要與uname-n保持一致,/etc/hosts文件,ssh雙機互信,時間同步,iptablesselinux):

[root@test-master ~]# cat /etc/redhat-release

Red Hat Enterprise Linux Server release 6.5(Santiago)

[root@test-master ~]# uname -rm

2.6.32-431.el6.x86_64 x86_64

[root@test-master ~]# uname -n

test-master

[root@test-master ~]# ifconfig | grep eth0 -A 1

eth0?????Link encap:Ethernet? HWaddr00:0C:29:1F:B6:AC?

?????????inet addr:10.96.20.113?Bcast:10.96.20.255?Mask:255.255.255.0

[root@test-master ~]# ifconfig | grep eth1 -A 1

eth1?????Link encap:Ethernet? HWaddr00:0C:29:1F:B6:B6?

?????????inet addr:172.16.1.113?Bcast:172.16.1.255?Mask:255.255.255.0

[root@test-master ~]# routeadd -host 172.16.1.114 dev eth1??#(添加主機路由,心跳傳送通過指定網卡出去,此句可追加到/etc/rc.local中,也可配置靜態路由#vim /etc/sysconfig/network-scripts/route-eth1添加172.16.1.114/24via 172.16.1.113

[root@test-master ~]# ssh-keygen-t rsa -f ./.ssh/id_rsa -P ''

Generating public/private rsa key pair.

Your identification has been saved in./.ssh/id_rsa.

Your public key has been saved in./.ssh/id_rsa.pub.

The key fingerprint is:

29:c3:a3:68:81:43:59:2f:0a:ad:8a:54:56:b0:1e:12root@test-master

The key's randomart p_w_picpath is:

+--[ RSA 2048]----+

| E o..??????????|

| .+ +???????????|

|.+.* .??????????|

|oo* o.??.?????? |

|+o..? =S??????? |

|+. o . +????????|

|o o .???????????|

| .???????? ??????|

|????????????????|

+-----------------+

[root@test-master ~]# ssh-copy-id-i ./.ssh/id_rsa root@test-backup

The authenticity of host 'test-backup(10.96.20.114)' can't be established.

RSA key fingerprint is63:f5:2e:dc:96:64:54:72:8e:14:7e:ec:ef:b8:a1:0c.

Are you sure you want to continue connecting(yes/no)? yes

Warning: Permanently added 'test-backup' (RSA) tothe list of known hosts.

root@test-backup's password:

Now try logging into the machine, with "ssh'root@test-backup'", and check in:

?

?.ssh/authorized_keys

?

to make sure we haven't added extra keys that youweren't expecting.

[root@test-master ~]# crontab -l

*/5 * * * * /usr/sbin/ntpdate time.windows.com&> /dev/null

[root@test-master ~]# service crond restart

Stopping crond:?????????? ?????????????????????????????????[? OK? ]

Starting crond:???????????????????????????????????????????[? OK? ]

[root@test-master ~]# wget http://mirrors.ustc.edu.cn/fedora/epel/6/x86_64/epel-release-6-8.noarch.rpm

[root@test-master ~]# rpm -ivh epel-release-6-8.noarch.rpm

warning: epel-release-6-8.noarch.rpm: Header V3RSA/SHA256 Signature, key ID 0608b895: NOKEY

Preparing...???????????????########################################### [100%]

??1:epel-release??????????########################################### [100%]

[root@test-master ~]# yum search heartbeat

……

heartbeat-devel.i686 : Heartbeat developmentpackage

heartbeat-devel.x86_64 : Heartbeat developmentpackage

heartbeat-libs.i686 : Heartbeat libraries

heartbeat-libs.x86_64 : Heartbeat libraries

heartbeat.x86_64 : Messaging and membershipsubsystem for High-Availability Linux

[root@test-master ~]# yum-y install heartbeat

[root@test-master ~]# chkconfig heartbeat off

[root@test-master ~]# chkconfig --list heartbeat

heartbeat???????? 0:off 1:off 2:off 3:off 4:off 5:off 6:off

?

test-backup

[root@test-backup ~]# uname -n

test-backup

[root@test-backup ~]# ifconfig | grep eth0 -A 1

eth0?????Link encap:Ethernet? HWaddr00:0C:29:15:E6:BB?

?????????inet addr:10.96.20.114?Bcast:10.96.20.255?Mask:255.255.255.0

[root@test-backup ~]# ifconfig | grep eth1 -A 1

eth1?????Link encap:Ethernet? HWaddr00:0C:29:15:E6:C5?

?????????inet addr:172.16.1.114?Bcast:172.16.1.255?Mask:255.255.255.0

[root@test-backup ~]# routeadd -host 172.16.1.113 dev eth1

[root@test-backup ~]# ssh-keygen-t rsa -f ./.ssh/id_rsa -P ''

Generating public/private rsa key pair.

Your identification has been saved in./.ssh/id_rsa.

Your public key has been saved in ./.ssh/id_rsa.pub.

The key fingerprint is:

08:ea:6a:44:7f:1a:c9:bf:ff:01:d5:32:e5:39:1b:b8root@test-backup

The key's randomart p_w_picpath is:

+--[ RSA 2048]----+

|??????????.???? |

|????????? =.??? |

|??? .??? = *????|

| . . . .. + +???|

|. + . ..SE .????|

| o = . ?.???????|

|. . =???.?????? |

| o . .???.????? |

|o???.o...?????? |

+-----------------+

[root@test-backup ~]#ssh-copy-id -i ./.ssh/id_rsa root@test-master

The authenticity of host 'test-master(10.96.20.113)' can't be established.

RSA key fingerprint is63:f5:2e:dc:96:64:54:72:8e:14:7e:ec:ef:b8:a1:0c.

Are you sure you want to continue connecting(yes/no)? yes

Warning: Permanently added 'test-master' (RSA) tothe list of known hosts.

root@test-master's password:

Now try logging into the machine, with "ssh'root@test-master'", and check in:

?

?.ssh/authorized_keys

?

to make sure we haven't added extra keys that youweren't expecting.

[root@test-backup ~]# crontab -l

*/5 * * * * /usr/sbin/ntpdate time.windows.com&> /dev/null

[root@test-backup ~]# service crond restart

Stopping crond:???????????????????????????????????????????[? OK? ]

Starting crond:???????????????????????????????????????????[? OK? ]

[root@test-backup ~]# wgethttp://mirrors.ustc.edu.cn/fedora/epel/6/x86_64/epel-release-6-8.noarch.rpm

[root@test-backup ~]# rpm -ivh epel-release-6-8.noarch.rpm

[root@test-backup ~]# yum -y install heartbeat

[root@test-backup ~]# chkconfig heartbeat off

[root@test-backup ~]# chkconfig --list heartbeat

heartbeat???????? 0:off 1:off 2:off 3:off 4:off 5:off 6:off

?

test-master

[root@test-master ~]# cp /usr/share/doc/heartbeat-3.0.4/{ha.cf,authkeys,haresources} /etc/ha.d/

[root@test-master ~]# cd /etc/ha.d

[root@test-master ha.d]# ls

authkeys?ha.cf? harc? haresources?rc.d? README.config? resource.d?shellfuncs

[root@test-master ha.d]# vim authkeys?? #(使用#ddif=/dev/random count=1 bs=512 | md5sum生成隨機數,sha1后跟隨機數)

auth 1

1 sha1912d6402295ac8d47109e56b177073b9

[root@test-master ha.d]# chmod 600 authkeys ??#(此文件權限600,否則啟動服務時會報錯)

[root@test-master ha.d]# ll !$

ll authkeys

-rw-------. 1 root root 692 Aug? 7 21:51 authkeys

[root@test-master ha.d]# vim ha.cf

debugfile /var/log/ha-debug?? #(調試日志)

logfile /var/log/ha-log

logfacility????local1?? #(在rsyslog服務中配置通過local1接收日志)

keepalive 2??#(指定心跳間隔時間,即2s發一次廣播)

deadtime 30??#(指定備node30s內沒收到主node的心跳信息則立即接管對方的服務資源)

warntime 10??#(指定心跳延遲的時間為10s,當10s內備node沒收到主node的心跳信息,就會往日志中寫警告,此時不會切換服務)

initdead 120??#(指定在heartbeat首次運行后,需等待120s才啟動主node的各資源,此項用于解決等待對方heartbeat服務啟動了自己才啟,此項值至少要是deadtime的兩倍)

udpport 694

#bcast? eth0?? #(指定心跳使用以太網廣播方式在eth0上廣播,若要使用兩個實際網絡傳送心跳則要為bcast eth0 eth1

mcast eth0 225.0.0.11 694 1 0?? #(設置多播通信的參數,多播地址在LAN內必須是唯一的,因為有可能有多個heartbeat服務,多播地址使用DIP224.0.0.0--239.255.255.255),格式為mcastdev mcast_group port ttl loop

auto_failback on?? #(用于主node恢復后failback

node test-master ??#(主node主機名,uname -n結果)

node test-backup?? #(備node主機名)

crm no?? #(是否開啟CRM功能)

[root@test-master ha.d]# vim haresources

test-master???? IPaddr::10.96.20.8/24/eth0?? #(此句相當于執行#/etc/ha.d/resource.d/IPaddr10.96.20.8/24/eth0 stop|startIPaddr即是/etc/ha.d/resource.d/下的腳本)

[root@test-master ha.d]#scp authkeys ha.cf haresources root@test-backup:/etc/ha.d/

authkeys???????????????????????????????????????????????????????????????????????????????????????????100%? 692???? 0.7KB/s??00:00???

ha.cf??????????????????????????????????????????????????????????????????????????????????????????????100%?? 10KB?10.3KB/s?? 00:00???

haresources????????????????????????????????????????????????????????????????????????????????????????100% 5944???? 5.8KB/s?? 00:00???

[root@test-master ha.d]# service heartbeat start

Starting High-Availability services: INFO:? Resource is stopped

Done.

?

[root@test-master ha.d]# ssh test-backup 'service heartbeat start'

Starting High-Availability services:2016/08/07_22:39:00 INFO:? Resource isstopped

Done.

[root@test-master ha.d]# ps aux | grep heartbeat

root?????63089? 0.0? 3.1?50124? 7164 ???????? SLs?22:38?? 0:00 heartbeat: mastercontrol process

root?????63093? 0.0? 3.1?50076? 7116 ???????? SL??22:38?? 0:00 heartbeat: FIFOreader???????

root?????63094? 0.0? 3.1?50072? 7112 ???????? SL??22:38?? 0:00 heartbeat: write:mcast eth0?

root?????63095? 0.0? 3.1?50072? 7112 ???????? SL??22:38?? 0:00 heartbeat: read:mcast eth0??

root?????63136? 0.0? 0.3 103264??836 pts/0??? S+?? 22:39??0:00 grep heartbeat

[root@test-master ha.d]# ssh test-backup 'ps aux |grep heartbeat'

root??????3050? 0.0? 3.1?50124? 7164 ???????? SLs?22:39?? 0:00 heartbeat: mastercontrol process

root??????3054? 0.0? 3.1?50076? 7116 ???????? SL??22:39?? 0:00 heartbeat: FIFOreader???????

root??????3055? 0.0? 3.1?50072? 7112 ???????? SL??22:39?? 0:00 heartbeat: write:mcast eth0?

root??????3056? 0.0? 3.1?50072? 7112 ???????? SL??22:39?? 0:00 heartbeat: read:mcast eth0??

root??????3094? 0.0? 0.5 106104?1368 ???????? Ss?? 22:39??0:00 bash -c ps aux | grep heartbeat

root??????3108? 0.0? 0.3 103264??832 ???????? S??? 22:39??0:00 grep heartbeat

[root@test-master ha.d]# netstat -tnulp | grep heartbeat

udp???????0????? 0 225.0.0.11:694????????????? 0.0.0.0:*?????????????????????????????? 63094/heartbeat:wr

udp???????0????? 0 0.0.0.0:50268?????????????? 0.0.0.0:*?????????????????????????????? 63094/heartbeat:wr

[root@test-master ha.d]# ssh test-backup 'netstat-tnulp | grep heartbeat'

udp???????0????? 0 0.0.0.0:58019?????????????? 0.0.0.0:*?????????????????????????????? 3055/heartbeat:wri

udp???????0????? 0 225.0.0.11:694????????????? 0.0.0.0:*??????????? ???????????????????3055/heartbeat: wri

[root@test-master ha.d]# ip addr | grep 10.96.20

??? inet 10.96.20.113/24 brd 10.96.20.255 scope global eth0

??? inet 10.96.20.8/24 brd 10.96.20.255 scope global secondaryeth0

[root@test-master ha.d]# ssh test-backup 'ip addr |grep 10.96.20'

??? inet10.96.20.114/24 brd 10.96.20.255 scope global eth0

[root@test-master ha.d]# service heartbeat stop

Stopping High-Availability services: Done.

?

[root@test-master ha.d]# ip addr | grep 10.96.20

??? inet10.96.20.113/24 brd 10.96.20.255 scope global eth0

[root@test-master ha.d]# ssh test-backup 'ip addr |grep 10.96.20'

??? inet 10.96.20.114/24 brd 10.96.20.255 scope global eth0

??? inet 10.96.20.8/24 brd 10.96.20.255 scope global secondaryeth0

[root@test-master ha.d]# service heartbeat start

Starting High-Availability services: INFO:? Resource is stopped

Done.

?

[root@test-master ha.d]# ip addr | grep 10.96.20

??? inet10.96.20.113/24 brd 10.96.20.255 scope global eth0

??? inet10.96.20.8/24 brd 10.96.20.255 scope global secondary eth0

[root@test-master ha.d]# ssh test-backup 'ip addr |grep 10.96.20'

??? inet10.96.20.114/24 brd 10.96.20.255 scope global eth0

[root@test-master ~]# service heartbeat stop

Stopping High-Availability services: Done.

?

[root@test-master ~]# ssh test-backup 'serviceheartbeat stop'

Stopping High-Availability services: Done.

?

?

?

2、安裝配置drbd

test-master

[root@test-master ~]# fdisk -l

……

Disk /dev/sdb: 2147 MB, 2147483648 bytes

255 heads, 63 sectors/track, 261 cylinders

Units = cylinders of 16065 * 512 = 8225280 bytes

Sector size (logical/physical): 512 bytes / 512bytes

I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk identifier: 0x00000000

[root@test-master ~]# parted /dev/sdb? #parted命令可支持大于2T的硬盤,將新硬盤分兩個區,一個區用于放數據,另一個區用于drbdmeta data

GNU Parted 2.1

Using /dev/sdb

Welcome to GNU Parted! Type 'help' to view a listof commands.

(parted) h???????????????????????????????????????????????????????????????

? align-checkTYPE N??????????????????????? checkpartition N for TYPE(min|opt) alignment

? checkNUMBER???????????????????????????? do asimple check on the file system

? cp[FROM-DEVICE] FROM-NUMBER TO-NUMBER??copy file system to another partition

? help[COMMAND]?????????????????????????? printgeneral help, or help on COMMAND

? mklabel,mktable LABEL-TYPE ??????????????create a new disklabel (partitiontable)

? mkfs NUMBERFS-TYPE????????????????????? make aFS-TYPE file system on partition NUMBER

? mkpart PART-TYPE [FS-TYPE] START END???? make a partition

? mkpartfsPART-TYPE FS-TYPE START END???? make apartition with a file system

? move NUMBERSTART END??????????????????? movepartition NUMBER

? name NUMBERNAME???????????????????????? namepartition NUMBER as NAME

? print [devices|free|list,all|NUMBER] ????display the partition table, availabledevices, free space, all found partitions, or a

???????particular partition

? quit???????????????????????????????????? exitprogram

? rescueSTART END???????????????????????? rescuea lost partition near START and END

? resizeNUMBER START END????????????????? resizepartition NUMBER and its file system

? rmNUMBER???????????????????????????????delete partition NUMBER

? selectDEVICE??????????????????????????? choosethe device to edit

? set NUMBERFLAG STATE??????????????????? change theFLAG on partition NUMBER

? toggle[NUMBER [FLAG]]?????????????????? togglethe state of FLAG on partition NUMBER

? unitUNIT??????????????????????????????? setthe default unit to UNIT

?version?????????????????????????? ???????display the version number and copyrightinformation of GNU Parted

(parted) mklabel gpt?????????????????????????????????????????????????????

(parted) mkpart primary 0 1024

Warning: The resulting partition is not properlyaligned for best performance.

Ignore/Cancel? Ignore

(parted) mkpart primary 1025 2147????????????????????????????????????????

Warning: The resulting partition is not properlyaligned for best performance.

Ignore/Cancel? Ignore

(parted) p?????????????????????????????????? ?????????????????????????????

Model: VMware, VMware Virtual S (scsi)

Disk /dev/sdb: 2147MB

Sector size (logical/physical): 512B/512B

Partition Table: gpt

?

Number?Start?? End???? Size???File system? Name???? Flags

?1????? 17.4kB?1024MB? 1024MB?????????? ????primary

?2????? 1025MB?2147MB? 1122MB?????????????? primary

[root@test-master ~]# wget http://www.elrepo.org/elrepo-release-6-6.el6.elrepo.noarch.rpm

[root@test-master ~]# rpm -ivh elrepo-release-6-6.el6.elrepo.noarch.rpm

warning: elrepo-release-6-6.el6.elrepo.noarch.rpm:Header V4 DSA/SHA1 Signature, key ID baadae52: NOKEY

Preparing...???????????????########################################### [100%]

??1:elrepo-release????????########################################### [100%]

[root@test-master ~]# yum -y install drbd kmod-drbd84

[root@test-master ~]# modprobe drbd

FATAL: Module drbd not found.

[root@test-master ~]# yum -y install kernel*?? #(更新內核后要重啟系統)

[root@test-master ~]# uname -r

2.6.32-642.3.1.el6.x86_64

[root@test-master ~]# depmod

[root@test-master ~]# lsmod| grep drbd

drbd????????????????? 372759? 0

libcrc32c?????????????? 1246? 1 drbd

[root@test-master ~]# ll /usr/src/kernels/

total 12

drwxr-xr-x. 22 root root 4096 Mar 31 06:462.6.32-431.el6.x86_64

drwxr-xr-x. 22 root root 4096 Aug? 8 03:40 2.6.32-642.3.1.el6.x86_64

drwxr-xr-x. 22 root root 4096 Aug? 8 03:40 2.6.32-642.3.1.el6.x86_64.debug

[root@test-master ~]# echo "modprobe drbd >/dev/null 2>&1" > /etc/sysconfig/modules/drbd.modules

[root@test-master ~]# cat !$

cat /etc/sysconfig/modules/drbd.modules

modprobe drbd > /dev/null 2>&1

?

test-backup

[root@test-backup ~]# parted /dev/sdb

(parted) mklabel gpt

(parted) mkpart primary 0 4096???????????????????????????????????????????

Warning: The resulting partition is not properlyaligned for best performance.

Ignore/Cancel? Ignore????????????????????????????????????????????????????

(parted) mkpart primary 4097 5368????????????????????????????????????????

(parted) p???????????????????????????????????????????????????????????????

Model: VMware, VMware Virtual S (scsi)

Disk /dev/sdb: 5369MB

Sector size (logical/physical): 512B/512B

Partition Table: gpt

?

Number?Start?? End???? Size???File system? Name???? Flags

?1????? 17.4kB?4096MB? 4096MB?????????????? primary

?2????? 4097MB?5368MB? 1271MB?????????????? primary

[root@test-backup ~]# wget http://www.elrepo.org/elrepo-release-6-6.el6.elrepo.noarch.rpm

[root@test-backup ~]# rpm -ivh elrepo-release-6-6.el6.elrepo.noarch.rpm

[root@test-backup ~]# ll /etc/yum.repos.d/

total 20

-rw-r--r--. 1 root root 1856 Jul 19 00:28CentOS6-Base-163.repo

-rw-r--r--. 1 root root 2150 Feb? 9? 2014elrepo.repo

-rw-r--r--. 1 root root? 957 Nov?4? 2012 epel.repo

-rw-r--r--. 1 root root 1056 Nov? 4? 2012epel-testing.repo

-rw-r--r--. 1 root root? 529 Mar 30 23:00 rhel-source.repo.bak

[root@test-backup ~]# yum -y install drbd kmod-drbd84

[root@test-backup ~]# yum -y install kernel*

[root@test-backup ~]# depmod

[root@test-backup ~]# lsmod | grep drbd

drbd????????????????? 372759? 0

libcrc32c?????????????? 1246? 1 drbd

[root@test-backup ~]# chkconfig drbd off

[root@test-backup ~]# chkconfig --list drbd

drbd???????????? 0:off 1:off 2:off 3:off 4:off 5:off 6:off

[root@test-backup ~]# echo "modprobe drbd >/dev/null 2>&1" > /etc/sysconfig/modules/drbd.modules

[root@test-backup ~]# cat !$

cat /etc/sysconfig/modules/drbd.modules

modprobe drbd > /dev/null 2>&1

?

test-master

[root@test-master ~]# vim /etc/drbd.d/global_common.conf

[root@test-master ~]# egrep -v "#|^$" /etc/drbd.d/global_common.conf

global {

???????? usage-countno;

}

common {

???????? handlers{

???????? }

???????? startup{

???????? }

???????? options{

???????? }

???????? disk{

???????????????on-io-error detach;

???????? }

???????? net {

???????? }

???????? syncer{

?????????????????? rate50M;

?????????????????? verify-algcrc32c;

???????? }

}

[root@test-master ~]# vim /etc/drbd.d/data.res

resource data {

???????protocol C;

??????? ontest-master {

???????????????device? /dev/drbd0;

???????????????disk??? /dev/sdb1;

???????????????address 172.16.1.113:7788;

???????????????meta-disk?????? /dev/sdb2[0];

??????? }

??????? ontest-backup {

???????????????device? /dev/drbd0;

???????????????disk??? /dev/sdb1;

???????????????address 172.16.1.114:7788;

???????????????meta-disk?????? /dev/sdb2[0];

??????? }

}

[root@test-master ~]# cd /etc/drbd.d

[root@test-master drbd.d]# scp global_common.conf data.res root@test-backup:/etc/drbd.d/

global_common.conf?????????????????????????????????????????????????????????????????????????????????????100% 2144???? 2.1KB/s?? 00:00???

data.res???????????????????????????????????????????????????????????????????????????????????????????????100%? 251???? 0.3KB/s??00:00???

?

[root@test-master drbd.d]# drbdadm --help

USAGE: drbdadm COMMAND [OPTION...]{all|RESOURCE...}

GENERAL OPTIONS:

? --stacked,-S

? --dry-run,-d

? --verbose,-v

?--config-file=..., -c ...

?--config-to-test=..., -t ...

? --drbdsetup=...,-s ...

?--drbdmeta=..., -m ...

?--drbd-proxy-ctl=..., -p ...

?--sh-varname=..., -n ...

? --peer=...,-P ...

? --version,-V

?--setup-option=..., -W ...

? --help, -h

?

COMMANDS:

?attach???????????????????????????? disk-options?????????????????? ????

?detach???????????????????????????? connect???????????????????????????

?net-options??????????????????????? disconnect????????????????????????

?up ????????????????????????????????resource-options??????????????????

?down ??????????????????????????????primary???????????????????????????

?secondary????????????????????????? invalidate????????????????????????

?invalidate-remote????????????????? outdate???????????????????????????

?resize???????????????????????????? verify??? ?????????????????????????

?pause-sync???????????????????????? resume-sync???????????????????????

?adjust????????????????????????????adjust-with-progress??????????????

?wait-connect?????????????????????? wait-con-int??????????????????????

?role???????? ??????????????????????cstate????????????????????????????

?dstate???????????????????????????? dump??????????????????????????????

?dump-xml?????????????????????????? create-md ?????????????????????????

?show-gi??????????????????????????? get-gi??????????? ?????????????????

?dump-md??????????????????????????? wipe-md???????????????????????????

?apply-al?????????????????????????? hidden-commands????

[root@test-master drbd.d]# drbdadm create-md data

initializing activity log

NOT initializing bitmap

Writing meta data...

New drbd meta data block successfully created.

[root@test-master drbd.d]# ssh test-backup 'drbdadm create-md data'

NOT initializing bitmap

initializing activity log

Writing meta data...

New drbd meta data block successfully created.

[root@test-master drbd.d]#drbdadm up data

[root@test-master drbd.d]# ssh test-backup 'drbdadm up data'

[root@test-master drbd.d]# cat /proc/drbd

version: 8.4.7-1 (api:1/proto:86-101)

GIT-hash: 3a6a769340ef93b1ba2792c6461250790795db49build by mockbuild@Build64R6, 2016-01-12 13:27:11

?0:cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----

??? ns:0 nr:0dw:0 dr:0 al:8 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:999984

[root@test-master drbd.d]# ssh test-backup 'cat /proc/drbd'

version: 8.4.7-1 (api:1/proto:86-101)

GIT-hash: 3a6a769340ef93b1ba2792c6461250790795db49build by mockbuild@Build64R6, 2016-01-12 13:27:11

?0:cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----

??? ns:0 nr:0dw:0 dr:0 al:16 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:999984

[root@test-master drbd.d]# drbdadm -- --overwrite-data-of-peer primary data??#(僅在主上執行)

[root@test-master drbd.d]# cat /proc/drbd

version: 8.4.7-1 (api:1/proto:86-101)

GIT-hash: 3a6a769340ef93b1ba2792c6461250790795db49build by mockbuild@Build64R6, 2016-01-12 13:27:11

?0:cs:SyncSource ro:Primary/Secondaryds:UpToDate/Inconsistent C r-----

??? ns:339968nr:0 dw:0 dr:340647 al:8 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:660016

???????? [=====>..............]sync'ed: 34.3% (660016/999984)K

???????? finish:0:00:15 speed: 42,496 (42,496) K/sec

[root@test-master drbd.d]# cat /proc/drbd

version: 8.4.7-1 (api:1/proto:86-101)

GIT-hash: 3a6a769340ef93b1ba2792c6461250790795db49build by mockbuild@Build64R6, 2016-01-12 13:27:11

?0:cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----

??? ns:630784nr:0 dw:0 dr:631463 al:8 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:369200

???????? [===========>........]sync'ed: 63.3% (369200/999984)K

???????? finish:0:00:09 speed: 39,424 (39,424) K/sec

[root@test-master drbd.d]# cat /proc/drbd

version: 8.4.7-1 (api:1/proto:86-101)

GIT-hash: 3a6a769340ef93b1ba2792c6461250790795db49build by mockbuild@Build64R6, 2016-01-12 13:27:11

?0:cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----

??? ns:942080nr:0 dw:0 dr:942759 al:8 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:57904

???????? [=================>..]sync'ed: 94.3% (57904/999984)K

???????? finish:0:00:01 speed: 39,196 (39,252) K/sec

[root@test-master drbd.d]# cat /proc/drbd

version: 8.4.7-1 (api:1/proto:86-101)

GIT-hash: 3a6a769340ef93b1ba2792c6461250790795db49build by mockbuild@Build64R6, 2016-01-12 13:27:11

?0:cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----

??? ns:999983nr:0 dw:0 dr:1000662 al:8 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

[root@test-master drbd.d]# ssh test-backup 'cat /proc/drbd'

version: 8.4.7-1 (api:1/proto:86-101)

GIT-hash: 3a6a769340ef93b1ba2792c6461250790795db49build by mockbuild@Build64R6, 2016-01-12 13:27:11

?0:cs:Connected ro:Secondary/Primaryds:UpToDate/UpToDate C r-----

??? ns:0nr:999983 dw:999983 dr:0 al:16 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

[root@test-master drbd.d]# mkdir/drbd

[root@test-master drbd.d]# ssh test-backup 'mkdir /drbd'

[root@test-master drbd.d]#?mkfs.ext4 -b 4096 /dev/drbd0?? #(僅在主上執行,meta分區不要格式化)

Writing superblocks and filesystem accountinginformation: done

[root@test-master drbd.d]# tune2fs -c -1 /dev/drbd0

tune2fs 1.41.12 (17-May-2010)

Setting maximal mount count to -1

[root@test-master drbd.d]# mount /dev/drbd0 /drbd

[root@test-master drbd.d]# cd /drbd

[root@test-master drbd]#?for i in `seq 1 10`; do touch test$i; done

[root@test-master drbd]# ls

lost+found?test1? test10? test2?test3? test4? test5?test6? test7? test8?test9

[root@test-master drbd]# cd

[root@test-master ~]# umount /dev/drbd0

[root@test-master ~]# drbdadm secondary data

[root@test-master ~]# cat /proc/drbd

version: 8.4.7-1 (api:1/proto:86-101)

GIT-hash: 3a6a769340ef93b1ba2792c6461250790795db49build by mockbuild@Build64R6, 2016-01-12 13:27:11

?0:cs:Connected ro:Secondary/Secondaryds:UpToDate/UpToDate C r-----

???ns:1032538 nr:0 dw:32554 dr:1001751 al:19 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1wo:f oos:0

?

test-backup

[root@test-backup ~]# cat /proc/drbd

version: 8.4.7-1 (api:1/proto:86-101)

GIT-hash: 3a6a769340ef93b1ba2792c6461250790795db49build by mockbuild@Build64R6, 2016-01-12 13:27:11

?0:cs:Connected ro:Secondary/Secondaryds:UpToDate/UpToDate C r-----

??? ns:0nr:1032538 dw:1032538 dr:0 al:16 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

[root@test-backup ~]#?drbdadm primary data

[root@test-backup ~]# cat /proc/drbd

version: 8.4.7-1 (api:1/proto:86-101)

GIT-hash: 3a6a769340ef93b1ba2792c6461250790795db49build by mockbuild@Build64R6, 2016-01-12 13:27:11

?0:cs:Connected ro:Primary/Secondaryds:UpToDate/UpToDate C r-----

??? ns:0nr:1032538 dw:1032538 dr:679 al:16 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0

[root@test-backup ~]# mount /dev/drbd0 /drbd

[root@test-backup ~]# ls /drbd

lost+found?test1? test10? test2?test3? test4? test5?test6? test7? test8?test9

?

?

3、調試heartbeat+drbd

[root@test-master ~]# ssh test-backup 'umount/drbd'

[root@test-master ~]# ssh test-backup 'drbdadmsecondary data'

[root@test-master ~]# service drbd stop

Stopping all DRBD resources: .

[root@test-master ~]# ssh test-backup 'service drbdstop'

Stopping all DRBD resources: .

[root@test-master ~]# service heartbeat status

heartbeat is stopped. No process

[root@test-master ~]# ssh test-backup 'serviceheartbeat status'

heartbeat is stopped. No process

[root@test-master ~]# ll/etc/ha.d/resource.d/{Filesystem,drbddisk}

-rwxr-xr-x. 1 root root 3162 Jan 12? 2016 /etc/ha.d/resource.d/drbddisk

-rwxr-xr-x. 1 root root 1903 Dec? 2? 2013/etc/ha.d/resource.d/Filesystem

[root@test-master ~]# vim /etc/ha.d/haresources ??#(此行內容相當于腳本加參數的執行方式,例如#/etc/ha.d/resource.d/IPaddr 10.96.20.8/24/eth0 start|stop#/etc/ha.d/resource.d/drbddiskdata start|stop#/etc/ha.d/resource.d/Filesystem /dev/drbd0 /drbd ext4 start|stopheartbeat就是這樣按配置的先后順序控制資源的,如果heartbeat出問題了,可通過查看日志并單獨運行這些命令排錯)

test-master???? IPaddr::10.96.20.8/24/eth0????? drbddisk::data? Filesystem::/dev/drbd/0::/drbd::ext4

[root@test-master ~]# scp /etc/ha.d/haresourcesroot@test-backup:/etc/ha.d/

haresources???????????????????????????????????????????????????????????????????????????????????????????????100% 5996???? 5.9KB/s?? 00:00?

[root@test-master~]# service drbd start?? #(在主node執行)

Starting DRBD resources: [

???? createres: data

?? preparedisk: data

??? adjustdisk: data

???? adjustnet: data

]

..........

***************************************************************

?DRBD's startupscript waits for the peer node(s) to appear.

?- If thisnode was already a degraded cluster before the

?? reboot,the timeout is 0 seconds. [degr-wfc-timeout]

?- If thepeer was available before the reboot, the timeout

?? is 0seconds. [wfc-timeout]

?? (Thesevalues are for resource 'data'; 0 sec -> wait forever)

?To abortwaiting enter 'yes' [? 23]:

[root@test-backup~]# service drbd start?? #(在備node執行)

Starting DRBD resources: [

???? createres: data

?? preparedisk: data

??? adjustdisk: data

???? adjustnet: data

]

.

[root@test-master ~]# drbdadm role data

Secondary/Secondary

[root@test-master ~]# ssh test-backup 'drbdadm roledata'

Secondary/Secondary

[root@test-master ~]# drbdadm -- --overwrite-data-of-peer primary data

[root@test-master ~]# drbdadm role data

Primary/Secondary

[root@test-master ~]# service heartbeat start

Starting High-Availability services: INFO:? Resource is stopped

Done.

[root@test-master ~]# ssh test-backup 'serviceheartbeat start'

Starting High-Availability services: 2016/08/09_03:08:11INFO:? Resource is stopped

Done.

[root@test-master ~]# ip addr | grep 10.96.20

??? inet10.96.20.113/24 brd 10.96.20.255 scope global eth0

??? inet 10.96.20.8/24 brd 10.96.20.255 scope global secondaryeth0

[root@test-master ~]# drbdadm role data

Primary/Secondary

[root@test-master ~]# df -h

Filesystem?????Size? Used Avail Use% Mounted on

/dev/sda2???????18G? 6.3G?? 11G?38% /

tmpfs??????????112M???? 0? 112M??0% /dev/shm

/dev/sda1??????283M?? 83M? 185M?31% /boot

/dev/sr0???????3.6G ?3.6G???? 0 100% /mnt/cdrom

/dev/drbd0????? 946M?1.3M? 896M?? 1% /drbd

[root@test-master ~]# ls /drbd

lost+found?test1? test10? test2?test3? test4? test5?test6? test7? test8?test9

?

[root@test-master ~]# service heartbeat stop

Stopping High-Availability services: Done.

[root@test-master ~]# ssh test-backup 'ip addr |grep 10.96.20'

??? inet10.96.20.114/24 brd 10.96.20.255 scope global eth0

??? inet10.96.20.8/24 brd 10.96.20.255 scope global secondary eth0

[root@test-master ~]# ssh test-backup 'df -h'

Filesystem?????Size? Used Avail Use% Mounted on

/dev/sda2???????18G? 3.9G?? 13G?24% /

tmpfs??????????112M???? 0? 112M??0% /dev/shm

/dev/sda1??????283M?? 83M? 185M?31% /boot

/dev/sr0???????3.6G? 3.6G???? 0 100% /mnt/cdrom

/dev/drbd0?????946M? 1.3M ?896M??1% /drbd

[root@test-master ~]# ssh test-backup 'ls /drbd'

lost+found

test1

test10

test2

test3

test4

test5

test6

test7

test8

test9

?

[root@test-master ~]# drbdadm role data??

Secondary/Primary

[root@test-master ~]# service heartbeat start?? #node恢復后,先確保把drbd理順,弄正常,再開啟heartbeat服務

Starting High-Availability services: INFO:? Resource is stopped

Done.

[root@test-master ~]# drbdadm role data

Primary/Secondary

[root@test-master ~]# ip addr | grep 10.96.20

??? inet10.96.20.113/24 brd 10.96.20.255 scope global eth0

??? inet10.96.20.8/24 brd 10.96.20.255 scope global secondary eth0

[root@test-master ~]# df -h

Filesystem?????Size? Used Avail Use% Mounted on

/dev/sda2???????18G? 6.3G?? 11G?38% /

tmpfs??????????112M???? 0? 112M??0% /dev/shm

/dev/sda1? ?????283M??83M? 185M? 31% /boot

/dev/sr0???????3.6G? 3.6G???? 0 100% /mnt/cdrom

/dev/drbd0?????946M? 1.3M? 896M??1% /drbd

[root@test-master ~]# ls /drbd

lost+found?test1? test10? test2?test3? test4? test5?test6? test7? test8?test9

?

注:若兩端出現Primary/UnknownSecondary/Unknown,調整方法:

#service heartbeat stop?? #(把兩端heartbeat服務停掉)

#drbdadm secondary data?? #(將備nodedrbd置從)

#drbdadm disconnect data

#drbdadm -- --discard-my-data connect data

#drbdadm role data

#drbdadm connect data?? #(主node操作)

?

?

4、安裝配置nfs

在兩個主nodenfs slave1上均如下操作:

[root@test-master ~]# yum -y groupinstall 'NFS fileserver'

[root@test-master ~]# rpm -qa nfs-utils rpcbind

nfs-utils-1.2.3-70.el6_8.1.x86_64

rpcbind-0.2.0-12.el6.x86_64

[root@test-master ~]# service rpcbind start

[root@test-master ~]# service nfs start

Starting NFS services:???????????????????????????????????? [? OK? ]

Starting NFS quotas:?????????????????????????????????????? [? OK? ]

Starting NFS mountd:?????????????????????????????????????? [? OK? ]

Starting NFS daemon:?????? ????????????????????????????????[? OK? ]

Starting RPC idmapd:?????????????????????????????????????? [? OK? ]

[root@test-master ~]# chkconfig rpcbind on

[root@test-master ~]# chkconfig nfs on

[root@test-master ~]# chkconfig --list rpcbind

rpcbind??????????? 0:off 1:off 2:on 3:on 4:on 5:on 6:off

[root@test-master ~]# chkconfig --list nfs

nfs?????????????? 0:off 1:off 2:on 3:on 4:on 5:on 6:off

?

在兩個主node上操作:

[root@test-master ~]# vim /etc/exports

/drbd??10.96.20.*(rw,sync,all_squash,anonuid=65534,anongid=65534,mp,fsid=2)

[root@test-master ~]# chmod 777 -R /drbd

[root@test-master ~]# service nfs reload?? #(相當于#exportfs-r

?

?

5、測試:

兩端主均開啟heartbeat

nfs-slave上測試,正常

wKioL1etJfKA0JZwAAD7Xi1PiMw060.jpg

[root@test-master ~]# service heartbeat stop

Stopping High-Availability services:

/sbin/service: line 66: 17235 Killed????????????????? env -i PATH="$PATH"TERM="$TERM" "${SERVICEDIR}/${SERVICE}" ${OPTIONS}

[root@test-master ~]# tail -f /var/log/ha-log?? #(測試在對heartbeat停服時,切換過程中一直卸載不掉掛載的分區,最終會強制重啟server

Filesystem(Filesystem_/dev/drbd0)[19791]:? 2016/08/09_04:36:21 INFO: No processes on/drbd were signalled. force_unmount is

Filesystem(Filesystem_/dev/drbd0)[19791]:? 2016/08/09_04:36:22 ERROR: Couldn't unmount /drbd; trying cleanup with KILL

Filesystem(Filesystem_/dev/drbd0)[19791]:? 2016/08/09_04:36:22 INFO: No processes on/drbd were signalled. force_unmount is

Filesystem(Filesystem_/dev/drbd0)[19791]:? 2016/08/09_04:36:23 ERROR: Couldn't unmount/drbd, giving up!

/usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[19783]: 2016/08/09_04:36:23 ERROR:? Generic error

ResourceManager(default)[17256]:??????? 2016/08/09_04:36:23 ERROR: Return code 1from /etc/ha.d/resource.d/Filesystem

/usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[20014]: 2016/08/09_04:36:23 INFO:? Running OK

ResourceManager(default)[17256]:??????? 2016/08/09_04:36:23 CRIT: Resource STOP failure. Reboot required!

ResourceManager(default)[17256]:??????? 2016/08/09_04:36:23 CRIT: Killingheartbeat ungracefully!

?

[root@test-backup ~]# drbdadm role data?? #(主node那邊server重啟后,備node查看已接管)

Primary/Unknown

[root@test-backup ~]# ip addr

……

2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP>mtu 1500 qdisc pfifo_fast state UP qlen 1000

???link/ether 00:0c:29:15:e6:bb brd ff:ff:ff:ff:ff:ff

??? inet10.96.20.114/24 brd 10.96.20.255 scope global eth0

??? inet 10.96.20.8/24 brd 10.96.20.255 scopeglobal secondary eth0

??? inet6fe80::20c:29ff:fe15:e6bb/64 scope link

??????valid_lft forever preferred_lft forever

[root@test-backup ~]# df -h

Filesystem?????Size? Used Avail Use% Mounted on

/dev/sda2???????18G? 3.9G??13G? 24% /

tmpfs??????????112M???? 0? 112M??0% /dev/shm

/dev/sda1??????283M?? 83M? 185M?31% /boot

/dev/sr0???????3.6G? 3.6G???? 0 100% /mnt/cdrom

/dev/drbd0?????946M? 1.3M? 896M??1% /drbd

[root@test-backup ~]# ls /drbd

lost+found?test111? test2? test222.txt?test3? test4? test5?test6? test7? test8?test9

?

兩主node的熱備是實現了,但nfs slave掛載時一直掛載不上,卡住了,服務端(nfs master active)保存有nfs客戶端掛載狀態,這時需重啟nfs服務端,于是在heartbeatharesources配置文件中加入腳本,讓其切換時重啟nfs

?

關閉兩主nodedrbdheartbeat服務

[root@test-master ~]# vim /etc/ha.d/haresources

test-master????IPaddr::10.96.20.8/24/eth0?????drbddisk::data?Filesystem::/dev/drbd0::/drbd::ext4????killnfs

[root@test-master ~]# cd /etc/ha.d/resource.d/

[root@test-master resource.d]# vim killnfs

---------------script start-------------

#!/bin/bash

#

?

for i in {1..10};do

???????killall nfsd

done

service nfs start

exit 0

----------------script end--------------

[root@test-master resource.d]# chmod 755 killnfs

[root@test-master resource.d]# ll killnfs

-rwxr-xr-x. 1 root root 79 Aug? 9 21:02 killnfs

[root@test-master resource.d]# scp killnfs root@test-backup:/etc/ha.d/resource.d/

killnfs???????????????????????????????????????????????????????????????????????????????????????????????????100%?? 79???? 0.1KB/s??00:00???

[root@test-master resource.d]# cd ..

[root@test-master ha.d]# scp haresources root@test-backup:/etc/ha.d/

haresources???????????????????????????????????????????????????????????????????????????????????????????????100% 6003???? 5.9KB/s?? 00:00???

調整好drbd再開啟heartbeat,重新測試,nfs slave在主切換時正常,沒有掛載不上或卡住的問題

wKioL1etJibj9SkcAACNMbI8qo0924.jpg

注:調試的一個大前提是,確保drbd是正常的,再開啟heartbeat這樣就不會有問題

?

?

注:ganji圖片架構演變

wKioL1etJjXiRjytAABNwC_03q0381.jpg

?

wKioL1etJkWD7HrqAABclO3ZYg8364.jpg

注:用戶上傳圖片到web server上后,web server把圖片POST到對應設置ID的圖片server上,圖片server上的php接收到POST來的圖片把圖片寫入到本地磁盤并返回對應的成功狀態碼,前端web server根據返回成功的狀態碼把圖片server對應的ID和對應的圖片path寫入到DB server;用戶訪問頁面時,根據請求從DB讀取圖片server ID和圖片的URL到對應圖片server上訪問

?