V 8 nfs+drbd+heartbeat
?
nfs+drbd+heartbeat,nfs或分布式存儲mfs只要有單點都可用此方案解決
在企業實際生產場景中,nfs是中小企業最常用的存儲架構解決方案之一,該架構方案部署簡單、維護方便,只需通過配inotify+rsync簡單而高效的數據同步方式就可實現對nfs存儲系統的數據進行異機主從同步及類似MySQL的rw splitting,且多個從(讀r)還可通過LVS或haproxy實現LB,既分擔大并發讀數據的壓力又排除了從的單點故障;
web server上的存儲:
方案一(讀寫在任意一臺web server上都行,通過inotify+rsync將每個web server上的數據同步至其它web server,例如web1-->web2-->web3-->web2-->web1);
方案二(在LB器上配置,在寫(上傳文件)時只能到web3上,r在web{1,2}上,使用inotify+rsync在同步時web3-->web2、web3-->web1);
方案三(使用共享存儲nfs,若只一個rw都在一個上就成了單點,再加一臺,一主一備,彼此間使用inotify+rsync同步數據,這兩個可rw都放到一個上,另一個僅用來備份數據,也可以讀在備上寫在主上;一般讀多寫少,可再加一臺作為備,一主多從,減輕r的壓力,同步數據時主-->備1、主-->備2;若某一個存儲故障,web{1,2,3}要重新掛載,備掛掉一個不影響,主若掛掉則不可寫;于是給主做高可用master-active和master-inactive,這兩臺主同一時間僅一臺對外提供服務,master-inactive為不活動狀態僅在切換后才對外提供服務);
方案四(棄用nfs共享存儲,使用master-active和master-inactive,只將數據寫到共享存儲上,再返回來將共享存儲的數據同步到web{1,2,3}的本地,讀時直接從本地拿;
?
一主多從模型中,若要實現當主掛掉時仍可寫,且可繼續同步到從,用nfs+drbd+heartbeat實現主的高可用,解決主單點問題,當master-activenfs故障切至master-inactive nfs上,這兩主的數據是一致的,master-inactivenfs會自動與其它所有從nfs進行同步,從而實現nfs存儲系統熱備
?
master-active故障切至備master-inactive上時,備node要仍能向nfs slave同步數據,此時同步就不能全部同步而要僅同步切換后變更的數據,此處可用sersync代替inotify,通過sersync的-r選項(或者也可以先不讓inotify啟動,待備node的heartbeat啟動并掛載好之后,再開啟inotify服務)
?
注:此方案與MFS、FastDFS、GFS等FS相比,部署簡單、維護控制方面較容易,符合簡單易高效的原則;但也有缺陷,每個node都是全部的數據(和MySQL同步一樣),大數據量文件同步很可能有數據延遲發生,可根據不同的數據目錄進行拆分同步(類似MySQL的庫表拆分方案),對于延遲也可開啟多個同步實例并通過程序實現控制讀寫邏輯,還要能監測到同步狀態
?
nfs高可用方案,解決在兩主node在切換時,nfsslave讀不到數據卡死狀態,可從以下幾方面入手:
rpcbind服務要一直確保開啟(主node、備node、nfs客戶端都要開啟);
nfs client(nfs slave)監控本地已掛載的nfs共享目錄,如果發現讀不了,執行重新掛載;
nfs client監控master-inactivenode是否有VIP出現或者drbd的狀態變為Primary,如果有執行重新掛載(nfs服務切換時通過SSH等機制nfs client實現remount;利用nagios監控,如果master inactive node出現VIP,執行一個指腳本進行多臺nfsclient的remount);
?
?
如圖:橢圓標注是此節操作的內容
?
注:單臺server,無需文件存儲,數據放本地,只有做集群的情況下才需要做專門的存儲
?
注:問題(單點,rw都在一個上性能不好,企業中做運維要考慮的(數據保護;7*24小時持續服務)
?
注:
web1和web2一般用LNMP;
IMG1和IMG2一般nginx或lighttpd;
該方案既解決nfs master單點,又解決了并發讀性能問題,但如果數據寫并發持續加大,會導致如下問題:
適用于200-300張/s上傳的圖片,并發同步效率方面還可以,若高于300張/s可能導致master和slave同步延遲,解決辦法:開啟多線程同步,優化監控事件、磁盤IO、網絡IO;
若IMG server很多的情況下,只有一臺master,master既負責寫,又負責給多臺同步數據,壓力會很大;
圖片問題非常大時,每個node都是全部完整的數據,若總容量3T以上,可能導致單臺server存儲空間不夠,解決辦法:(一、可利用MySQL拆庫的思路解決容量、寫性能、同步延遲的問題,例如初期規則img1--img5,5個目錄對應5個域名,掛載這5個目錄,每個imgNUM變為一組新的nfs主從高可用及rw spltting的集群,rw splitting可用POST或webDAV的方式);(二、通過DNS擴展多主的架構,增加新的服務意味著單點);(三、利用MySQL、Oracle、Mongodb、cassandra等數據庫的內部功能實現文件數據的同步,愛奇藝用mongodb的GridFS做圖片存儲);
注:mongodb的GridFS做圖片存儲(支持分布式,設計思路:圖片存儲唯一;只存原始圖;首次請求生成縮略圖并生成靜態文件;url固定,根據不同url產生縮略圖;參考Abusing Amazon p_w_picpaths
注:facebook圖片管理架構
?
注:
給nfs做HA解決了單點,浪費了一臺server;
nfs兩主之間是通過heartbeat+drbd,采用drbd的C協議實時同步;
nfs(M)和nfs(S)之間通過inotify+rsync異步同步,nfs(S)通過VIP與nfs(M)進行同步,nfsslaveNUM用來讀,nfs master用來寫,這解決了并發讀性能問題;也可將nfs master只寫,再由nfs master推至appserver(棄用nfs方案);
物理磁盤做RAID10或RAID0根據性能和冗余需求來選擇;server之間、server和switch是用雙千兆網卡bounding綁定;應用server(包括不限于web)通過VIP訪問nfs(M),通過不同的VIP訪問LB的nfs(S)存儲池;nfs(M)的數據在drbd的分區中;
在數據量不大的情況下,可將直接將數據從nfs(M)上直接同步至appserver本地,讀全都從appserver本地讀取,寫要到nfs(M)上;
用inotify+rsync做從master--slave同步時,在并發寫大的情況下會導致數據延遲或不同步;
?
注:
在企業實際工作場景中,只有萬不得已才會去搞DB和文件存儲的問題,平時應多在網站架構上做調整,以讓用戶請求最小化的訪問DB及存儲系統,例如做文件緩存和數據緩存(高并發的核心原則:把所有的用戶訪問請求都盡量往前推),而不是上來就搞分布式存儲系統,對于中小企業用分布式存儲就是大炮打蚊子,2012年facebook已經很大的時候還是用nfs存儲系統(分布式不是萬能的,會消耗大量的人力、物力,控制不好會帶來災難的后果)
?
注:
為緩解網站訪問的壓力,盡量將user訪問的內容往前推,有放到user本地的就不要放到CDN,能放到CDN的就不要放到本地server,充分利用每一層的緩存,直到萬不得已才讓用戶訪問后端的DB,在此基礎上若撐不住,解決辦法:使用ssd+sata,還不行使用分布式存儲
?
?
1、安裝配置heartbeat
準備環境:
VIP:10.96.20.8
master:eth0(10.96.20.113)、eth1(172.16.1.113,不配網關及dns)、主機名(test-master)
backup:eth0(10.96.20.114)、eth1(172.16.1.114,不配網關及dns)、主機名(test-backup)
雙網卡、雙硬盤、
注:eth0為管理IP;eth1心跳連接及drbd傳輸通道,若是生產環境中心跳傳輸和數據傳輸用一個網卡要做限制,給心跳留有帶寬
注:規范vmware中標簽,Xshell中標簽,公司中的生產環境所有主機均應在/etc/hosts文件中有相應記錄,方便分發及管理維護
?
test-master(分別配置主機名/etc/sysconfig/network結果一定要與uname-n保持一致,/etc/hosts文件,ssh雙機互信,時間同步,iptables,selinux):
[root@test-master ~]# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.5(Santiago)
[root@test-master ~]# uname -rm
2.6.32-431.el6.x86_64 x86_64
[root@test-master ~]# uname -n
test-master
[root@test-master ~]# ifconfig | grep eth0 -A 1
eth0?????Link encap:Ethernet? HWaddr00:0C:29:1F:B6:AC?
?????????inet addr:10.96.20.113?Bcast:10.96.20.255?Mask:255.255.255.0
[root@test-master ~]# ifconfig | grep eth1 -A 1
eth1?????Link encap:Ethernet? HWaddr00:0C:29:1F:B6:B6?
?????????inet addr:172.16.1.113?Bcast:172.16.1.255?Mask:255.255.255.0
[root@test-master ~]# routeadd -host 172.16.1.114 dev eth1??#(添加主機路由,心跳傳送通過指定網卡出去,此句可追加到/etc/rc.local中,也可配置靜態路由#vim /etc/sysconfig/network-scripts/route-eth1添加172.16.1.114/24via 172.16.1.113)
[root@test-master ~]# ssh-keygen-t rsa -f ./.ssh/id_rsa -P ''
Generating public/private rsa key pair.
Your identification has been saved in./.ssh/id_rsa.
Your public key has been saved in./.ssh/id_rsa.pub.
The key fingerprint is:
29:c3:a3:68:81:43:59:2f:0a:ad:8a:54:56:b0:1e:12root@test-master
The key's randomart p_w_picpath is:
+--[ RSA 2048]----+
| E o..??????????|
| .+ +???????????|
|.+.* .??????????|
|oo* o.??.?????? |
|+o..? =S??????? |
|+. o . +????????|
|o o .???????????|
| .???????? ??????|
|????????????????|
+-----------------+
[root@test-master ~]# ssh-copy-id-i ./.ssh/id_rsa root@test-backup
The authenticity of host 'test-backup(10.96.20.114)' can't be established.
RSA key fingerprint is63:f5:2e:dc:96:64:54:72:8e:14:7e:ec:ef:b8:a1:0c.
Are you sure you want to continue connecting(yes/no)? yes
Warning: Permanently added 'test-backup' (RSA) tothe list of known hosts.
root@test-backup's password:
Now try logging into the machine, with "ssh'root@test-backup'", and check in:
?
?.ssh/authorized_keys
?
to make sure we haven't added extra keys that youweren't expecting.
[root@test-master ~]# crontab -l
*/5 * * * * /usr/sbin/ntpdate time.windows.com&> /dev/null
[root@test-master ~]# service crond restart
Stopping crond:?????????? ?????????????????????????????????[? OK? ]
Starting crond:???????????????????????????????????????????[? OK? ]
[root@test-master ~]# wget http://mirrors.ustc.edu.cn/fedora/epel/6/x86_64/epel-release-6-8.noarch.rpm
[root@test-master ~]# rpm -ivh epel-release-6-8.noarch.rpm
warning: epel-release-6-8.noarch.rpm: Header V3RSA/SHA256 Signature, key ID 0608b895: NOKEY
Preparing...???????????????########################################### [100%]
??1:epel-release??????????########################################### [100%]
[root@test-master ~]# yum search heartbeat
……
heartbeat-devel.i686 : Heartbeat developmentpackage
heartbeat-devel.x86_64 : Heartbeat developmentpackage
heartbeat-libs.i686 : Heartbeat libraries
heartbeat-libs.x86_64 : Heartbeat libraries
heartbeat.x86_64 : Messaging and membershipsubsystem for High-Availability Linux
[root@test-master ~]# yum-y install heartbeat
[root@test-master ~]# chkconfig heartbeat off
[root@test-master ~]# chkconfig --list heartbeat
heartbeat???????? 0:off 1:off 2:off 3:off 4:off 5:off 6:off
?
test-backup:
[root@test-backup ~]# uname -n
test-backup
[root@test-backup ~]# ifconfig | grep eth0 -A 1
eth0?????Link encap:Ethernet? HWaddr00:0C:29:15:E6:BB?
?????????inet addr:10.96.20.114?Bcast:10.96.20.255?Mask:255.255.255.0
[root@test-backup ~]# ifconfig | grep eth1 -A 1
eth1?????Link encap:Ethernet? HWaddr00:0C:29:15:E6:C5?
?????????inet addr:172.16.1.114?Bcast:172.16.1.255?Mask:255.255.255.0
[root@test-backup ~]# routeadd -host 172.16.1.113 dev eth1
[root@test-backup ~]# ssh-keygen-t rsa -f ./.ssh/id_rsa -P ''
Generating public/private rsa key pair.
Your identification has been saved in./.ssh/id_rsa.
Your public key has been saved in ./.ssh/id_rsa.pub.
The key fingerprint is:
08:ea:6a:44:7f:1a:c9:bf:ff:01:d5:32:e5:39:1b:b8root@test-backup
The key's randomart p_w_picpath is:
+--[ RSA 2048]----+
|??????????.???? |
|????????? =.??? |
|??? .??? = *????|
| . . . .. + +???|
|. + . ..SE .????|
| o = . ?.???????|
|. . =???.?????? |
| o . .???.????? |
|o???.o...?????? |
+-----------------+
[root@test-backup ~]#ssh-copy-id -i ./.ssh/id_rsa root@test-master
The authenticity of host 'test-master(10.96.20.113)' can't be established.
RSA key fingerprint is63:f5:2e:dc:96:64:54:72:8e:14:7e:ec:ef:b8:a1:0c.
Are you sure you want to continue connecting(yes/no)? yes
Warning: Permanently added 'test-master' (RSA) tothe list of known hosts.
root@test-master's password:
Now try logging into the machine, with "ssh'root@test-master'", and check in:
?
?.ssh/authorized_keys
?
to make sure we haven't added extra keys that youweren't expecting.
[root@test-backup ~]# crontab -l
*/5 * * * * /usr/sbin/ntpdate time.windows.com&> /dev/null
[root@test-backup ~]# service crond restart
Stopping crond:???????????????????????????????????????????[? OK? ]
Starting crond:???????????????????????????????????????????[? OK? ]
[root@test-backup ~]# wgethttp://mirrors.ustc.edu.cn/fedora/epel/6/x86_64/epel-release-6-8.noarch.rpm
[root@test-backup ~]# rpm -ivh epel-release-6-8.noarch.rpm
[root@test-backup ~]# yum -y install heartbeat
[root@test-backup ~]# chkconfig heartbeat off
[root@test-backup ~]# chkconfig --list heartbeat
heartbeat???????? 0:off 1:off 2:off 3:off 4:off 5:off 6:off
?
test-master:
[root@test-master ~]# cp /usr/share/doc/heartbeat-3.0.4/{ha.cf,authkeys,haresources} /etc/ha.d/
[root@test-master ~]# cd /etc/ha.d
[root@test-master ha.d]# ls
authkeys?ha.cf? harc? haresources?rc.d? README.config? resource.d?shellfuncs
[root@test-master ha.d]# vim authkeys?? #(使用#ddif=/dev/random count=1 bs=512 | md5sum生成隨機數,sha1后跟隨機數)
auth 1
1 sha1912d6402295ac8d47109e56b177073b9
[root@test-master ha.d]# chmod 600 authkeys ??#(此文件權限600,否則啟動服務時會報錯)
[root@test-master ha.d]# ll !$
ll authkeys
-rw-------. 1 root root 692 Aug? 7 21:51 authkeys
[root@test-master ha.d]# vim ha.cf
debugfile /var/log/ha-debug?? #(調試日志)
logfile /var/log/ha-log
logfacility????local1?? #(在rsyslog服務中配置通過local1接收日志)
keepalive 2??#(指定心跳間隔時間,即2s發一次廣播)
deadtime 30??#(指定備node在30s內沒收到主node的心跳信息則立即接管對方的服務資源)
warntime 10??#(指定心跳延遲的時間為10s,當10s內備node沒收到主node的心跳信息,就會往日志中寫警告,此時不會切換服務)
initdead 120??#(指定在heartbeat首次運行后,需等待120s才啟動主node的各資源,此項用于解決等待對方heartbeat服務啟動了自己才啟,此項值至少要是deadtime的兩倍)
udpport 694
#bcast? eth0?? #(指定心跳使用以太網廣播方式在eth0上廣播,若要使用兩個實際網絡傳送心跳則要為bcast eth0 eth1)
mcast eth0 225.0.0.11 694 1 0?? #(設置多播通信的參數,多播地址在LAN內必須是唯一的,因為有可能有多個heartbeat服務,多播地址使用D類IP(224.0.0.0--239.255.255.255),格式為mcastdev mcast_group port ttl loop)
auto_failback on?? #(用于主node恢復后failback)
node test-master ??#(主node主機名,uname -n結果)
node test-backup?? #(備node主機名)
crm no?? #(是否開啟CRM功能)
[root@test-master ha.d]# vim haresources
test-master???? IPaddr::10.96.20.8/24/eth0?? #(此句相當于執行#/etc/ha.d/resource.d/IPaddr10.96.20.8/24/eth0 stop|start,IPaddr即是/etc/ha.d/resource.d/下的腳本)
[root@test-master ha.d]#scp authkeys ha.cf haresources root@test-backup:/etc/ha.d/
authkeys???????????????????????????????????????????????????????????????????????????????????????????100%? 692???? 0.7KB/s??00:00???
ha.cf??????????????????????????????????????????????????????????????????????????????????????????????100%?? 10KB?10.3KB/s?? 00:00???
haresources????????????????????????????????????????????????????????????????????????????????????????100% 5944???? 5.8KB/s?? 00:00???
[root@test-master ha.d]# service heartbeat start
Starting High-Availability services: INFO:? Resource is stopped
Done.
?
[root@test-master ha.d]# ssh test-backup 'service heartbeat start'
Starting High-Availability services:2016/08/07_22:39:00 INFO:? Resource isstopped
Done.
[root@test-master ha.d]# ps aux | grep heartbeat
root?????63089? 0.0? 3.1?50124? 7164 ???????? SLs?22:38?? 0:00 heartbeat: mastercontrol process
root?????63093? 0.0? 3.1?50076? 7116 ???????? SL??22:38?? 0:00 heartbeat: FIFOreader???????
root?????63094? 0.0? 3.1?50072? 7112 ???????? SL??22:38?? 0:00 heartbeat: write:mcast eth0?
root?????63095? 0.0? 3.1?50072? 7112 ???????? SL??22:38?? 0:00 heartbeat: read:mcast eth0??
root?????63136? 0.0? 0.3 103264??836 pts/0??? S+?? 22:39??0:00 grep heartbeat
[root@test-master ha.d]# ssh test-backup 'ps aux |grep heartbeat'
root??????3050? 0.0? 3.1?50124? 7164 ???????? SLs?22:39?? 0:00 heartbeat: mastercontrol process
root??????3054? 0.0? 3.1?50076? 7116 ???????? SL??22:39?? 0:00 heartbeat: FIFOreader???????
root??????3055? 0.0? 3.1?50072? 7112 ???????? SL??22:39?? 0:00 heartbeat: write:mcast eth0?
root??????3056? 0.0? 3.1?50072? 7112 ???????? SL??22:39?? 0:00 heartbeat: read:mcast eth0??
root??????3094? 0.0? 0.5 106104?1368 ???????? Ss?? 22:39??0:00 bash -c ps aux | grep heartbeat
root??????3108? 0.0? 0.3 103264??832 ???????? S??? 22:39??0:00 grep heartbeat
[root@test-master ha.d]# netstat -tnulp | grep heartbeat
udp???????0????? 0 225.0.0.11:694????????????? 0.0.0.0:*?????????????????????????????? 63094/heartbeat:wr
udp???????0????? 0 0.0.0.0:50268?????????????? 0.0.0.0:*?????????????????????????????? 63094/heartbeat:wr
[root@test-master ha.d]# ssh test-backup 'netstat-tnulp | grep heartbeat'
udp???????0????? 0 0.0.0.0:58019?????????????? 0.0.0.0:*?????????????????????????????? 3055/heartbeat:wri
udp???????0????? 0 225.0.0.11:694????????????? 0.0.0.0:*??????????? ???????????????????3055/heartbeat: wri
[root@test-master ha.d]# ip addr | grep 10.96.20
??? inet 10.96.20.113/24 brd 10.96.20.255 scope global eth0
??? inet 10.96.20.8/24 brd 10.96.20.255 scope global secondaryeth0
[root@test-master ha.d]# ssh test-backup 'ip addr |grep 10.96.20'
??? inet10.96.20.114/24 brd 10.96.20.255 scope global eth0
[root@test-master ha.d]# service heartbeat stop
Stopping High-Availability services: Done.
?
[root@test-master ha.d]# ip addr | grep 10.96.20
??? inet10.96.20.113/24 brd 10.96.20.255 scope global eth0
[root@test-master ha.d]# ssh test-backup 'ip addr |grep 10.96.20'
??? inet 10.96.20.114/24 brd 10.96.20.255 scope global eth0
??? inet 10.96.20.8/24 brd 10.96.20.255 scope global secondaryeth0
[root@test-master ha.d]# service heartbeat start
Starting High-Availability services: INFO:? Resource is stopped
Done.
?
[root@test-master ha.d]# ip addr | grep 10.96.20
??? inet10.96.20.113/24 brd 10.96.20.255 scope global eth0
??? inet10.96.20.8/24 brd 10.96.20.255 scope global secondary eth0
[root@test-master ha.d]# ssh test-backup 'ip addr |grep 10.96.20'
??? inet10.96.20.114/24 brd 10.96.20.255 scope global eth0
[root@test-master ~]# service heartbeat stop
Stopping High-Availability services: Done.
?
[root@test-master ~]# ssh test-backup 'serviceheartbeat stop'
Stopping High-Availability services: Done.
?
?
?
2、安裝配置drbd
test-master:
[root@test-master ~]# fdisk -l
……
Disk /dev/sdb: 2147 MB, 2147483648 bytes
255 heads, 63 sectors/track, 261 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000
[root@test-master ~]# parted /dev/sdb? #(parted命令可支持大于2T的硬盤,將新硬盤分兩個區,一個區用于放數據,另一個區用于drbd的meta data)
GNU Parted 2.1
Using /dev/sdb
Welcome to GNU Parted! Type 'help' to view a listof commands.
(parted) h???????????????????????????????????????????????????????????????
? align-checkTYPE N??????????????????????? checkpartition N for TYPE(min|opt) alignment
? checkNUMBER???????????????????????????? do asimple check on the file system
? cp[FROM-DEVICE] FROM-NUMBER TO-NUMBER??copy file system to another partition
? help[COMMAND]?????????????????????????? printgeneral help, or help on COMMAND
? mklabel,mktable LABEL-TYPE ??????????????create a new disklabel (partitiontable)
? mkfs NUMBERFS-TYPE????????????????????? make aFS-TYPE file system on partition NUMBER
? mkpart PART-TYPE [FS-TYPE] START END???? make a partition
? mkpartfsPART-TYPE FS-TYPE START END???? make apartition with a file system
? move NUMBERSTART END??????????????????? movepartition NUMBER
? name NUMBERNAME???????????????????????? namepartition NUMBER as NAME
? print [devices|free|list,all|NUMBER] ????display the partition table, availabledevices, free space, all found partitions, or a
???????particular partition
? quit???????????????????????????????????? exitprogram
? rescueSTART END???????????????????????? rescuea lost partition near START and END
? resizeNUMBER START END????????????????? resizepartition NUMBER and its file system
? rmNUMBER???????????????????????????????delete partition NUMBER
? selectDEVICE??????????????????????????? choosethe device to edit
? set NUMBERFLAG STATE??????????????????? change theFLAG on partition NUMBER
? toggle[NUMBER [FLAG]]?????????????????? togglethe state of FLAG on partition NUMBER
? unitUNIT??????????????????????????????? setthe default unit to UNIT
?version?????????????????????????? ???????display the version number and copyrightinformation of GNU Parted
(parted) mklabel gpt?????????????????????????????????????????????????????
(parted) mkpart primary 0 1024
Warning: The resulting partition is not properlyaligned for best performance.
Ignore/Cancel? Ignore
(parted) mkpart primary 1025 2147????????????????????????????????????????
Warning: The resulting partition is not properlyaligned for best performance.
Ignore/Cancel? Ignore
(parted) p?????????????????????????????????? ?????????????????????????????
Model: VMware, VMware Virtual S (scsi)
Disk /dev/sdb: 2147MB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
?
Number?Start?? End???? Size???File system? Name???? Flags
?1????? 17.4kB?1024MB? 1024MB?????????? ????primary
?2????? 1025MB?2147MB? 1122MB?????????????? primary
[root@test-master ~]# wget http://www.elrepo.org/elrepo-release-6-6.el6.elrepo.noarch.rpm
[root@test-master ~]# rpm -ivh elrepo-release-6-6.el6.elrepo.noarch.rpm
warning: elrepo-release-6-6.el6.elrepo.noarch.rpm:Header V4 DSA/SHA1 Signature, key ID baadae52: NOKEY
Preparing...???????????????########################################### [100%]
??1:elrepo-release????????########################################### [100%]
[root@test-master ~]# yum -y install drbd kmod-drbd84
[root@test-master ~]# modprobe drbd
FATAL: Module drbd not found.
[root@test-master ~]# yum -y install kernel*?? #(更新內核后要重啟系統)
[root@test-master ~]# uname -r
2.6.32-642.3.1.el6.x86_64
[root@test-master ~]# depmod
[root@test-master ~]# lsmod| grep drbd
drbd????????????????? 372759? 0
libcrc32c?????????????? 1246? 1 drbd
[root@test-master ~]# ll /usr/src/kernels/
total 12
drwxr-xr-x. 22 root root 4096 Mar 31 06:462.6.32-431.el6.x86_64
drwxr-xr-x. 22 root root 4096 Aug? 8 03:40 2.6.32-642.3.1.el6.x86_64
drwxr-xr-x. 22 root root 4096 Aug? 8 03:40 2.6.32-642.3.1.el6.x86_64.debug
[root@test-master ~]# echo "modprobe drbd >/dev/null 2>&1" > /etc/sysconfig/modules/drbd.modules
[root@test-master ~]# cat !$
cat /etc/sysconfig/modules/drbd.modules
modprobe drbd > /dev/null 2>&1
?
test-backup:
[root@test-backup ~]# parted /dev/sdb
(parted) mklabel gpt
(parted) mkpart primary 0 4096???????????????????????????????????????????
Warning: The resulting partition is not properlyaligned for best performance.
Ignore/Cancel? Ignore????????????????????????????????????????????????????
(parted) mkpart primary 4097 5368????????????????????????????????????????
(parted) p???????????????????????????????????????????????????????????????
Model: VMware, VMware Virtual S (scsi)
Disk /dev/sdb: 5369MB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
?
Number?Start?? End???? Size???File system? Name???? Flags
?1????? 17.4kB?4096MB? 4096MB?????????????? primary
?2????? 4097MB?5368MB? 1271MB?????????????? primary
[root@test-backup ~]# wget http://www.elrepo.org/elrepo-release-6-6.el6.elrepo.noarch.rpm
[root@test-backup ~]# rpm -ivh elrepo-release-6-6.el6.elrepo.noarch.rpm
[root@test-backup ~]# ll /etc/yum.repos.d/
total 20
-rw-r--r--. 1 root root 1856 Jul 19 00:28CentOS6-Base-163.repo
-rw-r--r--. 1 root root 2150 Feb? 9? 2014elrepo.repo
-rw-r--r--. 1 root root? 957 Nov?4? 2012 epel.repo
-rw-r--r--. 1 root root 1056 Nov? 4? 2012epel-testing.repo
-rw-r--r--. 1 root root? 529 Mar 30 23:00 rhel-source.repo.bak
[root@test-backup ~]# yum -y install drbd kmod-drbd84
[root@test-backup ~]# yum -y install kernel*
[root@test-backup ~]# depmod
[root@test-backup ~]# lsmod | grep drbd
drbd????????????????? 372759? 0
libcrc32c?????????????? 1246? 1 drbd
[root@test-backup ~]# chkconfig drbd off
[root@test-backup ~]# chkconfig --list drbd
drbd???????????? 0:off 1:off 2:off 3:off 4:off 5:off 6:off
[root@test-backup ~]# echo "modprobe drbd >/dev/null 2>&1" > /etc/sysconfig/modules/drbd.modules
[root@test-backup ~]# cat !$
cat /etc/sysconfig/modules/drbd.modules
modprobe drbd > /dev/null 2>&1
?
test-master:
[root@test-master ~]# vim /etc/drbd.d/global_common.conf
[root@test-master ~]# egrep -v "#|^$" /etc/drbd.d/global_common.conf
global {
???????? usage-countno;
}
common {
???????? handlers{
???????? }
???????? startup{
???????? }
???????? options{
???????? }
???????? disk{
???????????????on-io-error detach;
???????? }
???????? net {
???????? }
???????? syncer{
?????????????????? rate50M;
?????????????????? verify-algcrc32c;
???????? }
}
[root@test-master ~]# vim /etc/drbd.d/data.res
resource data {
???????protocol C;
??????? ontest-master {
???????????????device? /dev/drbd0;
???????????????disk??? /dev/sdb1;
???????????????address 172.16.1.113:7788;
???????????????meta-disk?????? /dev/sdb2[0];
??????? }
??????? ontest-backup {
???????????????device? /dev/drbd0;
???????????????disk??? /dev/sdb1;
???????????????address 172.16.1.114:7788;
???????????????meta-disk?????? /dev/sdb2[0];
??????? }
}
[root@test-master ~]# cd /etc/drbd.d
[root@test-master drbd.d]# scp global_common.conf data.res root@test-backup:/etc/drbd.d/
global_common.conf?????????????????????????????????????????????????????????????????????????????????????100% 2144???? 2.1KB/s?? 00:00???
data.res???????????????????????????????????????????????????????????????????????????????????????????????100%? 251???? 0.3KB/s??00:00???
?
[root@test-master drbd.d]# drbdadm --help
USAGE: drbdadm COMMAND [OPTION...]{all|RESOURCE...}
GENERAL OPTIONS:
? --stacked,-S
? --dry-run,-d
? --verbose,-v
?--config-file=..., -c ...
?--config-to-test=..., -t ...
? --drbdsetup=...,-s ...
?--drbdmeta=..., -m ...
?--drbd-proxy-ctl=..., -p ...
?--sh-varname=..., -n ...
? --peer=...,-P ...
? --version,-V
?--setup-option=..., -W ...
? --help, -h
?
COMMANDS:
?attach???????????????????????????? disk-options?????????????????? ????
?detach???????????????????????????? connect???????????????????????????
?net-options??????????????????????? disconnect????????????????????????
?up ????????????????????????????????resource-options??????????????????
?down ??????????????????????????????primary???????????????????????????
?secondary????????????????????????? invalidate????????????????????????
?invalidate-remote????????????????? outdate???????????????????????????
?resize???????????????????????????? verify??? ?????????????????????????
?pause-sync???????????????????????? resume-sync???????????????????????
?adjust????????????????????????????adjust-with-progress??????????????
?wait-connect?????????????????????? wait-con-int??????????????????????
?role???????? ??????????????????????cstate????????????????????????????
?dstate???????????????????????????? dump??????????????????????????????
?dump-xml?????????????????????????? create-md ?????????????????????????
?show-gi??????????????????????????? get-gi??????????? ?????????????????
?dump-md??????????????????????????? wipe-md???????????????????????????
?apply-al?????????????????????????? hidden-commands????
[root@test-master drbd.d]# drbdadm create-md data
initializing activity log
NOT initializing bitmap
Writing meta data...
New drbd meta data block successfully created.
[root@test-master drbd.d]# ssh test-backup 'drbdadm create-md data'
NOT initializing bitmap
initializing activity log
Writing meta data...
New drbd meta data block successfully created.
[root@test-master drbd.d]#drbdadm up data
[root@test-master drbd.d]# ssh test-backup 'drbdadm up data'
[root@test-master drbd.d]# cat /proc/drbd
version: 8.4.7-1 (api:1/proto:86-101)
GIT-hash: 3a6a769340ef93b1ba2792c6461250790795db49build by mockbuild@Build64R6, 2016-01-12 13:27:11
?0:cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----
??? ns:0 nr:0dw:0 dr:0 al:8 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:999984
[root@test-master drbd.d]# ssh test-backup 'cat /proc/drbd'
version: 8.4.7-1 (api:1/proto:86-101)
GIT-hash: 3a6a769340ef93b1ba2792c6461250790795db49build by mockbuild@Build64R6, 2016-01-12 13:27:11
?0:cs:Connected ro:Secondary/Secondary ds:Inconsistent/Inconsistent C r-----
??? ns:0 nr:0dw:0 dr:0 al:16 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:999984
[root@test-master drbd.d]# drbdadm -- --overwrite-data-of-peer primary data??#(僅在主上執行)
[root@test-master drbd.d]# cat /proc/drbd
version: 8.4.7-1 (api:1/proto:86-101)
GIT-hash: 3a6a769340ef93b1ba2792c6461250790795db49build by mockbuild@Build64R6, 2016-01-12 13:27:11
?0:cs:SyncSource ro:Primary/Secondaryds:UpToDate/Inconsistent C r-----
??? ns:339968nr:0 dw:0 dr:340647 al:8 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:660016
???????? [=====>..............]sync'ed: 34.3% (660016/999984)K
???????? finish:0:00:15 speed: 42,496 (42,496) K/sec
[root@test-master drbd.d]# cat /proc/drbd
version: 8.4.7-1 (api:1/proto:86-101)
GIT-hash: 3a6a769340ef93b1ba2792c6461250790795db49build by mockbuild@Build64R6, 2016-01-12 13:27:11
?0:cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----
??? ns:630784nr:0 dw:0 dr:631463 al:8 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:369200
???????? [===========>........]sync'ed: 63.3% (369200/999984)K
???????? finish:0:00:09 speed: 39,424 (39,424) K/sec
[root@test-master drbd.d]# cat /proc/drbd
version: 8.4.7-1 (api:1/proto:86-101)
GIT-hash: 3a6a769340ef93b1ba2792c6461250790795db49build by mockbuild@Build64R6, 2016-01-12 13:27:11
?0:cs:SyncSource ro:Primary/Secondary ds:UpToDate/Inconsistent C r-----
??? ns:942080nr:0 dw:0 dr:942759 al:8 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:57904
???????? [=================>..]sync'ed: 94.3% (57904/999984)K
???????? finish:0:00:01 speed: 39,196 (39,252) K/sec
[root@test-master drbd.d]# cat /proc/drbd
version: 8.4.7-1 (api:1/proto:86-101)
GIT-hash: 3a6a769340ef93b1ba2792c6461250790795db49build by mockbuild@Build64R6, 2016-01-12 13:27:11
?0:cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r-----
??? ns:999983nr:0 dw:0 dr:1000662 al:8 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
[root@test-master drbd.d]# ssh test-backup 'cat /proc/drbd'
version: 8.4.7-1 (api:1/proto:86-101)
GIT-hash: 3a6a769340ef93b1ba2792c6461250790795db49build by mockbuild@Build64R6, 2016-01-12 13:27:11
?0:cs:Connected ro:Secondary/Primaryds:UpToDate/UpToDate C r-----
??? ns:0nr:999983 dw:999983 dr:0 al:16 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
[root@test-master drbd.d]# mkdir/drbd
[root@test-master drbd.d]# ssh test-backup 'mkdir /drbd'
[root@test-master drbd.d]#?mkfs.ext4 -b 4096 /dev/drbd0?? #(僅在主上執行,meta分區不要格式化)
Writing superblocks and filesystem accountinginformation: done
[root@test-master drbd.d]# tune2fs -c -1 /dev/drbd0
tune2fs 1.41.12 (17-May-2010)
Setting maximal mount count to -1
[root@test-master drbd.d]# mount /dev/drbd0 /drbd
[root@test-master drbd.d]# cd /drbd
[root@test-master drbd]#?for i in `seq 1 10`; do touch test$i; done
[root@test-master drbd]# ls
lost+found?test1? test10? test2?test3? test4? test5?test6? test7? test8?test9
[root@test-master drbd]# cd
[root@test-master ~]# umount /dev/drbd0
[root@test-master ~]# drbdadm secondary data
[root@test-master ~]# cat /proc/drbd
version: 8.4.7-1 (api:1/proto:86-101)
GIT-hash: 3a6a769340ef93b1ba2792c6461250790795db49build by mockbuild@Build64R6, 2016-01-12 13:27:11
?0:cs:Connected ro:Secondary/Secondaryds:UpToDate/UpToDate C r-----
???ns:1032538 nr:0 dw:32554 dr:1001751 al:19 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1wo:f oos:0
?
test-backup:
[root@test-backup ~]# cat /proc/drbd
version: 8.4.7-1 (api:1/proto:86-101)
GIT-hash: 3a6a769340ef93b1ba2792c6461250790795db49build by mockbuild@Build64R6, 2016-01-12 13:27:11
?0:cs:Connected ro:Secondary/Secondaryds:UpToDate/UpToDate C r-----
??? ns:0nr:1032538 dw:1032538 dr:0 al:16 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
[root@test-backup ~]#?drbdadm primary data
[root@test-backup ~]# cat /proc/drbd
version: 8.4.7-1 (api:1/proto:86-101)
GIT-hash: 3a6a769340ef93b1ba2792c6461250790795db49build by mockbuild@Build64R6, 2016-01-12 13:27:11
?0:cs:Connected ro:Primary/Secondaryds:UpToDate/UpToDate C r-----
??? ns:0nr:1032538 dw:1032538 dr:679 al:16 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
[root@test-backup ~]# mount /dev/drbd0 /drbd
[root@test-backup ~]# ls /drbd
lost+found?test1? test10? test2?test3? test4? test5?test6? test7? test8?test9
?
?
3、調試heartbeat+drbd
[root@test-master ~]# ssh test-backup 'umount/drbd'
[root@test-master ~]# ssh test-backup 'drbdadmsecondary data'
[root@test-master ~]# service drbd stop
Stopping all DRBD resources: .
[root@test-master ~]# ssh test-backup 'service drbdstop'
Stopping all DRBD resources: .
[root@test-master ~]# service heartbeat status
heartbeat is stopped. No process
[root@test-master ~]# ssh test-backup 'serviceheartbeat status'
heartbeat is stopped. No process
[root@test-master ~]# ll/etc/ha.d/resource.d/{Filesystem,drbddisk}
-rwxr-xr-x. 1 root root 3162 Jan 12? 2016 /etc/ha.d/resource.d/drbddisk
-rwxr-xr-x. 1 root root 1903 Dec? 2? 2013/etc/ha.d/resource.d/Filesystem
[root@test-master ~]# vim /etc/ha.d/haresources ??#(此行內容相當于腳本加參數的執行方式,例如#/etc/ha.d/resource.d/IPaddr 10.96.20.8/24/eth0 start|stop,#/etc/ha.d/resource.d/drbddiskdata start|stop,#/etc/ha.d/resource.d/Filesystem /dev/drbd0 /drbd ext4 start|stop;heartbeat就是這樣按配置的先后順序控制資源的,如果heartbeat出問題了,可通過查看日志并單獨運行這些命令排錯)
test-master???? IPaddr::10.96.20.8/24/eth0????? drbddisk::data? Filesystem::/dev/drbd/0::/drbd::ext4
[root@test-master ~]# scp /etc/ha.d/haresourcesroot@test-backup:/etc/ha.d/
haresources???????????????????????????????????????????????????????????????????????????????????????????????100% 5996???? 5.9KB/s?? 00:00?
[root@test-master~]# service drbd start?? #(在主node執行)
Starting DRBD resources: [
???? createres: data
?? preparedisk: data
??? adjustdisk: data
???? adjustnet: data
]
..........
***************************************************************
?DRBD's startupscript waits for the peer node(s) to appear.
?- If thisnode was already a degraded cluster before the
?? reboot,the timeout is 0 seconds. [degr-wfc-timeout]
?- If thepeer was available before the reboot, the timeout
?? is 0seconds. [wfc-timeout]
?? (Thesevalues are for resource 'data'; 0 sec -> wait forever)
?To abortwaiting enter 'yes' [? 23]:
[root@test-backup~]# service drbd start?? #(在備node執行)
Starting DRBD resources: [
???? createres: data
?? preparedisk: data
??? adjustdisk: data
???? adjustnet: data
]
.
[root@test-master ~]# drbdadm role data
Secondary/Secondary
[root@test-master ~]# ssh test-backup 'drbdadm roledata'
Secondary/Secondary
[root@test-master ~]# drbdadm -- --overwrite-data-of-peer primary data
[root@test-master ~]# drbdadm role data
Primary/Secondary
[root@test-master ~]# service heartbeat start
Starting High-Availability services: INFO:? Resource is stopped
Done.
[root@test-master ~]# ssh test-backup 'serviceheartbeat start'
Starting High-Availability services: 2016/08/09_03:08:11INFO:? Resource is stopped
Done.
[root@test-master ~]# ip addr | grep 10.96.20
??? inet10.96.20.113/24 brd 10.96.20.255 scope global eth0
??? inet 10.96.20.8/24 brd 10.96.20.255 scope global secondaryeth0
[root@test-master ~]# drbdadm role data
Primary/Secondary
[root@test-master ~]# df -h
Filesystem?????Size? Used Avail Use% Mounted on
/dev/sda2???????18G? 6.3G?? 11G?38% /
tmpfs??????????112M???? 0? 112M??0% /dev/shm
/dev/sda1??????283M?? 83M? 185M?31% /boot
/dev/sr0???????3.6G ?3.6G???? 0 100% /mnt/cdrom
/dev/drbd0????? 946M?1.3M? 896M?? 1% /drbd
[root@test-master ~]# ls /drbd
lost+found?test1? test10? test2?test3? test4? test5?test6? test7? test8?test9
?
[root@test-master ~]# service heartbeat stop
Stopping High-Availability services: Done.
[root@test-master ~]# ssh test-backup 'ip addr |grep 10.96.20'
??? inet10.96.20.114/24 brd 10.96.20.255 scope global eth0
??? inet10.96.20.8/24 brd 10.96.20.255 scope global secondary eth0
[root@test-master ~]# ssh test-backup 'df -h'
Filesystem?????Size? Used Avail Use% Mounted on
/dev/sda2???????18G? 3.9G?? 13G?24% /
tmpfs??????????112M???? 0? 112M??0% /dev/shm
/dev/sda1??????283M?? 83M? 185M?31% /boot
/dev/sr0???????3.6G? 3.6G???? 0 100% /mnt/cdrom
/dev/drbd0?????946M? 1.3M ?896M??1% /drbd
[root@test-master ~]# ssh test-backup 'ls /drbd'
lost+found
test1
test10
test2
test3
test4
test5
test6
test7
test8
test9
?
[root@test-master ~]# drbdadm role data??
Secondary/Primary
[root@test-master ~]# service heartbeat start?? #(主node恢復后,先確保把drbd理順,弄正常,再開啟heartbeat服務)
Starting High-Availability services: INFO:? Resource is stopped
Done.
[root@test-master ~]# drbdadm role data
Primary/Secondary
[root@test-master ~]# ip addr | grep 10.96.20
??? inet10.96.20.113/24 brd 10.96.20.255 scope global eth0
??? inet10.96.20.8/24 brd 10.96.20.255 scope global secondary eth0
[root@test-master ~]# df -h
Filesystem?????Size? Used Avail Use% Mounted on
/dev/sda2???????18G? 6.3G?? 11G?38% /
tmpfs??????????112M???? 0? 112M??0% /dev/shm
/dev/sda1? ?????283M??83M? 185M? 31% /boot
/dev/sr0???????3.6G? 3.6G???? 0 100% /mnt/cdrom
/dev/drbd0?????946M? 1.3M? 896M??1% /drbd
[root@test-master ~]# ls /drbd
lost+found?test1? test10? test2?test3? test4? test5?test6? test7? test8?test9
?
注:若兩端出現Primary/Unknown或Secondary/Unknown,調整方法:
#service heartbeat stop?? #(把兩端heartbeat服務停掉)
#drbdadm secondary data?? #(將備node的drbd置從)
#drbdadm disconnect data
#drbdadm -- --discard-my-data connect data
#drbdadm role data
#drbdadm connect data?? #(主node操作)
?
?
4、安裝配置nfs
在兩個主node和nfs slave1上均如下操作:
[root@test-master ~]# yum -y groupinstall 'NFS fileserver'
[root@test-master ~]# rpm -qa nfs-utils rpcbind
nfs-utils-1.2.3-70.el6_8.1.x86_64
rpcbind-0.2.0-12.el6.x86_64
[root@test-master ~]# service rpcbind start
[root@test-master ~]# service nfs start
Starting NFS services:???????????????????????????????????? [? OK? ]
Starting NFS quotas:?????????????????????????????????????? [? OK? ]
Starting NFS mountd:?????????????????????????????????????? [? OK? ]
Starting NFS daemon:?????? ????????????????????????????????[? OK? ]
Starting RPC idmapd:?????????????????????????????????????? [? OK? ]
[root@test-master ~]# chkconfig rpcbind on
[root@test-master ~]# chkconfig nfs on
[root@test-master ~]# chkconfig --list rpcbind
rpcbind??????????? 0:off 1:off 2:on 3:on 4:on 5:on 6:off
[root@test-master ~]# chkconfig --list nfs
nfs?????????????? 0:off 1:off 2:on 3:on 4:on 5:on 6:off
?
在兩個主node上操作:
[root@test-master ~]# vim /etc/exports
/drbd??10.96.20.*(rw,sync,all_squash,anonuid=65534,anongid=65534,mp,fsid=2)
[root@test-master ~]# chmod 777 -R /drbd
[root@test-master ~]# service nfs reload?? #(相當于#exportfs-r)
?
?
5、測試:
兩端主均開啟heartbeat
在nfs-slave上測試,正常
[root@test-master ~]# service heartbeat stop
Stopping High-Availability services:
/sbin/service: line 66: 17235 Killed????????????????? env -i PATH="$PATH"TERM="$TERM" "${SERVICEDIR}/${SERVICE}" ${OPTIONS}
[root@test-master ~]# tail -f /var/log/ha-log?? #(測試在對heartbeat停服時,切換過程中一直卸載不掉掛載的分區,最終會強制重啟server)
Filesystem(Filesystem_/dev/drbd0)[19791]:? 2016/08/09_04:36:21 INFO: No processes on/drbd were signalled. force_unmount is
Filesystem(Filesystem_/dev/drbd0)[19791]:? 2016/08/09_04:36:22 ERROR: Couldn't unmount /drbd; trying cleanup with KILL
Filesystem(Filesystem_/dev/drbd0)[19791]:? 2016/08/09_04:36:22 INFO: No processes on/drbd were signalled. force_unmount is
Filesystem(Filesystem_/dev/drbd0)[19791]:? 2016/08/09_04:36:23 ERROR: Couldn't unmount/drbd, giving up!
/usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[19783]: 2016/08/09_04:36:23 ERROR:? Generic error
ResourceManager(default)[17256]:??????? 2016/08/09_04:36:23 ERROR: Return code 1from /etc/ha.d/resource.d/Filesystem
/usr/lib/ocf/resource.d//heartbeat/Filesystem(Filesystem_/dev/drbd0)[20014]: 2016/08/09_04:36:23 INFO:? Running OK
ResourceManager(default)[17256]:??????? 2016/08/09_04:36:23 CRIT: Resource STOP failure. Reboot required!
ResourceManager(default)[17256]:??????? 2016/08/09_04:36:23 CRIT: Killingheartbeat ungracefully!
?
[root@test-backup ~]# drbdadm role data?? #(主node那邊server重啟后,備node查看已接管)
Primary/Unknown
[root@test-backup ~]# ip addr
……
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP>mtu 1500 qdisc pfifo_fast state UP qlen 1000
???link/ether 00:0c:29:15:e6:bb brd ff:ff:ff:ff:ff:ff
??? inet10.96.20.114/24 brd 10.96.20.255 scope global eth0
??? inet 10.96.20.8/24 brd 10.96.20.255 scopeglobal secondary eth0
??? inet6fe80::20c:29ff:fe15:e6bb/64 scope link
??????valid_lft forever preferred_lft forever
[root@test-backup ~]# df -h
Filesystem?????Size? Used Avail Use% Mounted on
/dev/sda2???????18G? 3.9G??13G? 24% /
tmpfs??????????112M???? 0? 112M??0% /dev/shm
/dev/sda1??????283M?? 83M? 185M?31% /boot
/dev/sr0???????3.6G? 3.6G???? 0 100% /mnt/cdrom
/dev/drbd0?????946M? 1.3M? 896M??1% /drbd
[root@test-backup ~]# ls /drbd
lost+found?test111? test2? test222.txt?test3? test4? test5?test6? test7? test8?test9
?
兩主node的熱備是實現了,但nfs slave掛載時一直掛載不上,卡住了,服務端(nfs master active)保存有nfs客戶端掛載狀態,這時需重啟nfs服務端,于是在heartbeat的haresources配置文件中加入腳本,讓其切換時重啟nfs
?
關閉兩主node的drbd和heartbeat服務
[root@test-master ~]# vim /etc/ha.d/haresources
test-master????IPaddr::10.96.20.8/24/eth0?????drbddisk::data?Filesystem::/dev/drbd0::/drbd::ext4????killnfs
[root@test-master ~]# cd /etc/ha.d/resource.d/
[root@test-master resource.d]# vim killnfs
---------------script start-------------
#!/bin/bash
#
?
for i in {1..10};do
???????killall nfsd
done
service nfs start
exit 0
----------------script end--------------
[root@test-master resource.d]# chmod 755 killnfs
[root@test-master resource.d]# ll killnfs
-rwxr-xr-x. 1 root root 79 Aug? 9 21:02 killnfs
[root@test-master resource.d]# scp killnfs root@test-backup:/etc/ha.d/resource.d/
killnfs???????????????????????????????????????????????????????????????????????????????????????????????????100%?? 79???? 0.1KB/s??00:00???
[root@test-master resource.d]# cd ..
[root@test-master ha.d]# scp haresources root@test-backup:/etc/ha.d/
haresources???????????????????????????????????????????????????????????????????????????????????????????????100% 6003???? 5.9KB/s?? 00:00???
調整好drbd再開啟heartbeat,重新測試,nfs slave在主切換時正常,沒有掛載不上或卡住的問題
注:調試的一個大前提是,確保drbd是正常的,再開啟heartbeat這樣就不會有問題
?
?
注:ganji圖片架構演變
?
注:用戶上傳圖片到web server上后,web server把圖片POST到對應設置ID的圖片server上,圖片server上的php接收到POST來的圖片把圖片寫入到本地磁盤并返回對應的成功狀態碼,前端web server根據返回成功的狀態碼把圖片server對應的ID和對應的圖片path寫入到DB server;用戶訪問頁面時,根據請求從DB讀取圖片server ID和圖片的URL到對應圖片server上訪問
?
轉載于:https://blog.51cto.com/jowin/1837154