Corosync概述:

Corosync是集群管理套件的一部分,它在傳遞信息的時候可以通過一個簡單的配置文件來定義信息傳遞的方式和協議等。它是一個新興的軟件,2008年推出,但其實它并不是一個真正意義上的新軟件,在2002年的時候有一個項目Openais , 它由于過大,分裂為兩個子項目,其中可以實現HA心跳信息傳輸的功能就是Corosync ,它的代碼60%左右來源于Openais. Corosync可以提供一個完整的HA功能,但是要實現更多,更復雜的功能,那就需要使用Openais了。Corosync是未來的發展方向。在以后的新項目里,一般采用Corosync,而hb_gui可以提供很好的HA管理功能,可以實現圖形化的管理。另外相關的圖形化有RHCS的套件luci+ricci,當然還有基于java開發的LCMC集群管理工具;它與heartbeat都是實現集群高可用的工具,到這里corosync與pacemaker的基礎知識就說到這里了,下面我們來看看怎么安裝corosync與pacemaker。


Corosync與pacemaker安裝:

1.環境說明
(1).操作系統
??? CentOS 6.5 X86_64位系統
(2).軟件環境
**corosync-1.4.1-17.el6.x86_64
**crmsh-1.2.6-4.el6.x86_64.rpm
**pssh-2.3.1-2.el6.x86_64.rpm
(3).拓撲環境
節點數:3 分別為:node1 node2 nfs
node1:172.16.100.6 node2:172.16.100.7 nfs:172.16.100.9 TestHost:172.16.100.88
拓撲結構如下圖所示:

wKioL1eugFSR86aWAAGG1nx7Rc4998.jpg

2.安裝及配置過程如下:

1、準備工作
為了配置一臺Linux主機成為HA的節點,通常需要做出如下的準備工作:

1)所有節點的主機名稱和對應的IP地址解析服務可以正常工作,且每個節點的主機名稱需要跟"uname -n“命令的結果保持一致;因此,需要保證兩個節點上的/etc/hosts文件均為下面的內容:

#?vim?/etc/hosts
172.16.100.6???node1.samlee.com?node1
172.16.100.7???node2.samlee.com?node2

為了使得重新啟動系統后仍能保持如上的主機名稱,還分別需要在各節點執行類似如下的命令:
Node1配置:

#?sed?-i?'s@\(HOSTNAME=\).*@\1node1.samlee.com@g'??/etc/sysconfig/network
#?hostname?node1.samlee.com

Node2配置:

#?sed?-i?'s@\(HOSTNAME=\).*@\1node2.samlee.com@g'?/etc/sysconfig/network
#?hostname?node2.samlee.com


2)設定兩個節點可以基于密鑰進行ssh通信,這可以通過如下的命令實現:
Node1配置:

#?ssh-keygen?-t?rsa?-P?''
#?ssh-copy-id?-i?~/.ssh/id_rsa.pub?root@node2
#?ssh?node2.samlee.com?'date';date

Node2配置:

#?ssh-keygen?-t?rsa?-P?''
#?ssh-copy-id?-i?~/.ssh/id_rsa.pub?root@node1
#?ssh?node1.samlee.com?'date';date

3)設置5分鐘自動同步時間(node1、node2都需要配置)

#?crontab?-e
*/5?*?*?*?*?/sbin/ntpdate?172.16.100.10?&>?/dev/null

4)關閉selinux(node1、node2都需要配置)

#?setenforce?0
#?vim?/etc/selinux/config
SELINUX=disabled



2、安裝配置Corosync集群管理工具

1)安裝Corosync工具及pacemaker(yum方式)

#?yum?-y?install?corosync
#?yum?-y?install?pacemaker

安裝crmsh(rpm方式)
RHEL自6.4起不再提供集群的命令行配置工具crmsh,轉而使用pcs;如果你習慣了使用crm命令,可下載相關的程序包自行安裝即可。crmsh依賴于pssh,因此需要一并下載。

#?cd?/root/corosync_packages/
#?yum?-y?--nogpgcheck?localinstall?crmsh*.rpm?pssh*.rpm

2)配置corosync(操作在node1.samlee.com上執行)

#?cd?/etc/corosync/
#?cp?corosync.conf.example?corosync.conf
#?vim?corosync.conf
#?Please?read?the?corosync.conf.5?manual?page
compatibility:?whitetanktotem?{version:?2secauth:?on????????--開啟認證功能threads:?0?????????--CPU個數interface?{ringnumber:?0bindnetaddr:?172.16.0.0????--集群節點運行所在的網絡地址mcastaddr:?226.96.6.17?????--組播傳輸地址mcastport:?5405????????????--心跳信息檢測端口ttl:?1}
}logging?{fileline:?offto_stderr:?noto_logfile:?yesto_syslog:?yeslogfile:?/var/log/cluster/corosync.logdebug:?offtimestamp:?onlogger_subsys?{subsys:?AMFdebug:?off}
}amf?{mode:?disabled
}
##設置隨corosync啟動的服務
service?{ver:????0name:????pacemaker
}
##ais運行身份設定
aisexec????{user:????rootgroup:????root
}
并設定此配置文件中?bindnetaddr后面的IP地址為你的網卡所在網絡的網絡地址,我們這里的兩個節點在172.16.0.0網絡,因此這里將其設定為172.16.0.0;如下
bindnetaddr:?172.16.0.0

3)生成節點間通信時所用到的認證密鑰文件:

#?corosync-keygen?
如果隨機數不夠的話需要需要登錄狀態狂敲鍵盤

如發現corosync生成/etc/corosync/autokey需要敲隨機數,可以使用以下方式解決(僅測試環境參考使用):

#?mv?/dev/random?/dev/h
#?ln?/dev/urandom?/dev/random
#??corosync-keygen?
#?rm?-rf?/dev/random??
#?mv?/dev/h?/dev/random

4)將corosync.conf和authkey復制至node2:

#?scp?-p?corosync.conf?authkey?node2:/etc/corosync/

5)分別在node1、node2兩個節點中創建corosync生成的日志所在的目錄

#?mkdir?/var/log/cluster
#?ssh?node2??'mkdir?/var/log/cluster'

6)啟動corosync服務

#?service?corosync?start
#?ssh?node2?'/etc/init.d/corosync?start'

7)查看corosync集群引擎是否正常啟動:

#?grep?-e?"Corosync?Cluster?Engine"?-e?"configuration?file"?/var/log/cluster/corosync.log
#?ssh?node2?'grep?-e?"Corosync?Cluster?Engine"?-e?"configuration?file"?/var/log/cluster/corosync.log'
如下所示證明正常啟動:
Aug?13?11:26:58?corosync?[MAIN??]?Corosync?Cluster?Engine?('1.4.1'):?started?and?ready?to?provide?service.
Aug?13?11:26:58?corosync?[MAIN??]?Successfully?read?main?configuration?file?'/etc/corosync/corosync.conf'.

8)查看初始化成員節點通知是否正常發出:

#?grep??TOTEM?/var/log/cluster/corosync.log
#?ssh?node2?'grep??TOTEM?/var/log/cluster/corosync.log'
如下所示證明正常發出:
Aug?13?13:19:20?corosync?[TOTEM?]?Initializing?transport?(UDP/IP?Multicast).
Aug?13?13:19:20?corosync?[TOTEM?]?Initializing?transmit/receive?security:?libtomcrypt?SOBER128/SHA1HMAC?(mode?0).
Aug?13?13:19:20?corosync?[TOTEM?]?The?network?interface?[172.16.100.6]?is?now?up.
Aug?13?13:19:20?corosync?[TOTEM?]?A?processor?joined?or?left?the?membership?and?a?new?membership?was?formed.
Aug?13?11:26:59?corosync?[TOTEM?]?A?processor?joined?or?left?the?membership?and?a?new?membership?was?formed.

9)檢查啟動過程中是否有錯誤產生。下面的錯誤信息表示packmaker不久之后將不再作為corosync的插件運行,因此,建議使用cman作為集群基礎架構服務;此處可安全忽略。

#?grep?ERROR:?/var/log/cluster/corosync.log?|?grep?-v?unpack_resources
Aug?13?13:19:20?corosync?[pcmk??]?ERROR:?process_ais_conf:?You?have?configured?a?cluster?using?the?Pacemaker?plugin?for?Corosync.?The?plugin?is?not?supported?in?this?environment?and?will?be?removed?very?soon.
Aug?13?13:19:20?corosync?[pcmk??]?ERROR:?process_ais_conf:??Please?see?Chapter?8?of?'Clusters?from?Scratch'?(http://www.clusterlabs.org/doc)?for?details?on?using?Pacemaker?with?CMAN

10)查看pacemaker是否正常啟動:

#?grep?pcmk_startup?/var/log/cluster/corosync.log
Aug?13?13:19:20?corosync?[pcmk??]?info:?pcmk_startup:?CRM:?Initialized
Aug?13?13:19:20?corosync?[pcmk??]?Logging:?Initialized?pcmk_startup
Aug?13?13:19:20?corosync?[pcmk??]?info:?pcmk_startup:?Maximum?core?file?size?is:?18446744073709551615
Aug?13?13:19:20?corosync?[pcmk??]?info:?pcmk_startup:?Service:?9
Aug?13?13:19:20?corosync?[pcmk??]?info:?pcmk_startup:?Local?hostname:?node1.samlee.com

11)如果安裝了crmsh,可使用如下命令查看集群節點的啟動狀態:

#?crm?status
Last?updated:?Sat?Aug?13?13:42:26?2016????????Last?change:?Sat?Aug?13?13:19:58?2016?by?hacluster?via?crmd?on?node1.samlee.com
Stack:?classic?openais?(with?plugin)
Current?DC:?node1.samlee.com?(version?1.1.14-8.el6-70404b0)?-?partition?with?quorum
2?nodes?and?0?resources?configured,?2?expected?votesOnline:?[?node1.samlee.com?node2.samlee.com?]

12)檢查corosync端口是否正常:

#?ss?-tunlp?|?grep?5405
udp????UNCONN?????0??????0???????????172.16.100.6:5405??????????????????*:*??????users:(("corosync",5879,15))
udp????UNCONN?????0??????0????????????226.96.6.17:5405??????????????????*:*??????users:(("corosync",5879,11))#?ssh?node2?'ss?-tunlp?|?grep?5405'
udp????UNCONN?????0??????0???????????172.16.100.7:5405??????????????????*:*??????users:(("corosync",5047,15))
udp????UNCONN?????0??????0????????????226.96.6.17:5405??????????????????*:*??????users:(("corosync",5047,11))

從上面的信息可以看出兩個節點都已經正常啟動,并且集群已經處于正常工作狀態.


13)執行ps auxf命令可以查看corosync啟動的各相關進程:

#?ps?auxf
root??????5879??0.9??0.9?545200??4648??????????Ssl??13:19???0:17?corosync
496???????5884??0.0??2.1??94608?10672??????????S<???13:19???0:00??\_?/usr/libexec/pacemaker/cib
root??????5885??0.0??0.8??95148??3968??????????S<???13:19???0:00??\_?/usr/libexec/pacemaker/stonithd
root??????5886??0.0??0.5??62932??2788??????????S<???13:19???0:00??\_?/usr/libexec/pacemaker/lrmd
496???????5887??0.0??0.6??85936??3196??????????S<???13:19???0:00??\_?/usr/libexec/pacemaker/attrd
496???????5888??0.0??3.7?117468?18504??????????S<???13:19???0:00??\_?/usr/libexec/pacemaker/pengine
496???????5889??0.0??0.8?135988??4228??????????S<???13:19???0:01??\_?/usr/libexec/pacemaker/crmd


3.集群資源管理

crmsh基本介紹

[root@node1?~]#?crm????##進入crmsh
crm(live)#?help????????##查看幫助This?is?crm?shell,?a?Pacemaker?command?line?interface.Available?commands:cib??????????????manage?shadow?CIBs????##CIB資源管理模塊resource?????????resources?management????##資源管理模塊configure????????CRM?cluster?configuration?##CRM配置,包含資源粘性、資源類型、資源約束等node?????????????nodes?management????##節點管理options??????????user?preferences????##用戶偏好history??????????CRM?cluster?history????##CRM歷史site?????????????Geo-cluster?support????##地理集群支持ra???????????????resource?agents?information?center????##資源代理配置status???????????show?cluster?status????##查看集群狀態help,????????????show?help?(help?topics?for?list?of?topics)????##查看幫助end,cd,up????????go?back?one?level????##返回上一級quit,bye,exit????exit?the?program????##退出
crm(live)#?configure?????##進入配置模式
crm(live)configure#?show?????##查看當前配置
node?node1.samlee.com
node?node2.samlee.com
property?$id="cib-bootstrap-options"?\dc-version="1.1.10-14.el6-368c726"?\cluster-infrastructure="classic?openais?(with?plugin)"?\expected-quorum-votes="2"
crm(live)configure#?verify?????##檢查當前配置語法,由于沒有STONITH,所以報錯,可關閉error:?unpack_resources:?????Resource?start-up?disabled?since?no?STONITH?resources?have?been?definederror:?unpack_resources:?????Either?configure?some?or?disable?STONITH?with?the?stonith-enabled?optionerror:?unpack_resources:?????NOTE:?Clusters?with?shared?data?need?STONITH?to?ensure?data?integrity
Errors?found?during?check:?config?not?valid
crm(live)configure#?property?stonith-enabled=false?##禁用stonith后再次檢查配置,無報錯
crm(live)configure#?verify?
crm(live)configure#?commit?##提交配置
crm(live)configure#?cd
crm(live)#?ra????##-進入RA(資源代理配置)模式
crm(live)ra#?helpThis?level?contains?commands?which?show?various?information?about
the?installed?resource?agents.?It?is?available?both?at?the?top
level?and?at?the?`configure`?level.Available?commands:classes??????????list?classes?and?providers????##查看RA類型list?????????????list?RA?for?a?class?(and?provider)##查看指定類型(或提供商)的RAmeta?????????????show?meta?data?for?a?RA????##查看RA詳細信息providers????????show?providers?for?a?RA?and?a?class????##查看指定資源的提供商和類型help?????????????show?help?(help?topics?for?list?of?topics)end??????????????go?back?one?levelquit?????????????exit?the?program
crm(live)ra#?classes?
lsb
ocf?/?heartbeat?pacemaker
service
stonith
crm(live)ra#?list?ocf?pacemaker
ClusterMon?????Dummy??????????HealthCPU??????HealthSMART????Stateful???????SysInfo????????SystemHealth???controld
ping???????????pingd??????????remote?????????
crm(live)ra#?info?ocf:heartbeat:IPaddr
crm(live)ra#?cd?
crm(live)#?status?##查看集群狀態
Last?updated:?Sat?Aug?13?15:51:13?2016
Last?change:?Sat?Aug?13?15:46:19?2016?via?cibadmin?on?node1.samlee.com
Stack:?classic?openais?(with?plugin)
Current?DC:?node2.samlee.com?-?partition?with?quorum
Version:?1.1.10-14.el6-368c726
2?Nodes?configured,?2?expected?votes
0?Resources?configuredOnline:?[?node1.samlee.com?node2.samlee.com?]

法定票數問題:
在雙節點集群中,由于票數是偶數,當心跳出現問題(腦裂)時,兩個節點都將達不到法定票數,默認quorum策略會關閉集群服務,為了避免這種情況,可以增加票數為奇數,或者調整默認quorum策略為【ignore】

crm(live)#?configure?
crm(live)configure#?property?no-quorum-policy=ignore
crm(live)configure#?show
node?node1.samlee.com
node?node2.samlee.com
property?$id="cib-bootstrap-options"?\dc-version="1.1.10-14.el6-368c726"?\cluster-infrastructure="classic?openais?(with?plugin)"?\expected-quorum-votes="2"?\stonith-enabled="false"?\no-quorum-policy="ignore"
crm(live)configure#?verify?
crm(live)configure#?commit

防止資源在節點恢復后移動:

故障發生時,資源會遷移到正常節點上,但當故障節點恢復后,資源可能再次回到原來節點,這在有些情況下并非是最好的策略,因為資源的遷移是有停機時間的,特別是一些復雜的應用,如oracle數據庫,這個時間會更長。為了避免這種情況可設置資源粘性策略。

crm(live)configure#?rsc_defaults?resource-stickiness=100??##設置資源粘性為100


實例應用:配置web高可用集群

(1)定義VIP:

crm(live)#?configure?
crm(live)configure#?primitive?webip?ocf:heartbeat:IPaddr?params?ip=172.16.100.99?nic=eth0?cidr_netmask=16
crm(live)configure#?verify?
crm(live)configure#?commit?
crm(live)configure#?cd
crm(live)#?status?
Last?updated:?Sat?Aug?13?17:46:25?2016
Last?change:?Sat?Aug?13?17:46:17?2016?via?cibadmin?on?node1.samlee.com
Stack:?classic?openais?(with?plugin)
Current?DC:?node2.samlee.com?-?partition?with?quorum
Version:?1.1.10-14.el6-368c726
2?Nodes?configured,?2?expected?votes
1?Resources?configuredOnline:?[?node1.samlee.com?node2.samlee.com?]webip????(ocf::heartbeat:IPaddr):????Started?node1.samlee.com

最后一行,定義的資源已經在node1上啟動。使用 ip addr show命令可以看到該VIP已經生效:

#?ip?addr?show
1:?lo:?<LOOPBACK,UP,LOWER_UP>?mtu?16436?qdisc?noqueue?state?UNKNOWN?link/loopback?00:00:00:00:00:00?brd?00:00:00:00:00:00inet?127.0.0.1/8?scope?host?loinet6?::1/128?scope?host?valid_lft?forever?preferred_lft?forever
2:?eth0:?<BROADCAST,MULTICAST,UP,LOWER_UP>?mtu?1500?qdisc?pfifo_fast?state?UP?qlen?1000link/ether?00:0c:29:07:45:da?brd?ff:ff:ff:ff:ff:ffinet?172.16.100.6/16?brd?172.16.255.255?scope?global?eth0inet?172.16.100.99/16?brd?172.16.255.255?scope?global?secondary?eth0????##已經生效!!inet6?fe80::20c:29ff:fe07:45da/64?scope?link?valid_lft?forever?preferred_lft?forever

(2)配置httpd資源

node1-web服務配置
#?yum?-y?install?httpd
#?echo?"<h1>node1.samlee.com</h1>"?>/var/www/html/index.html
#?service?httpd?start
#?chkconfig?httpd?off
#?service?httpd?stopnode2-web服務配置
#?yum?-y?install?httpd
#?echo?"<h1>node2.samlee.com</h1>"?>/var/www/html/index.html
#?service?httpd?start
#?chkconfig?httpd?off
#?service?httpd?stop
---------------------------------------------------------------------
---------------------------------------------------------------------
crm(live)#?configure?
pcrm(live)configure#?primitive?webserver?lsb:httpd
crm(live)configure#?show
node?node1.samlee.com
node?node2.samlee.com
primitive?webip?ocf:heartbeat:IPaddr?\params?ip="172.16.100.99"
primitive?webserver?lsb:httpd
property?$id="cib-bootstrap-options"?\dc-version="1.1.10-14.el6-368c726"?\cluster-infrastructure="classic?openais?(with?plugin)"?\expected-quorum-votes="2"?\stonith-enabled="false"?\no-quorum-policy="ignore"
rsc_defaults?$id="rsc-options"?\resource-stickiness="100"
crm(live)configure#?verify?
crm(live)configure#?commit?
crm(live)configure#?cd
crm(live)#?status?
Last?updated:?Sat?Aug?13?17:55:46?2016
Last?change:?Sat?Aug?13?17:55:19?2016?via?cibadmin?on?node1.samlee.com
Stack:?classic?openais?(with?plugin)
Current?DC:?node2.samlee.com?-?partition?with?quorum
Version:?1.1.10-14.el6-368c726
2?Nodes?configured,?2?expected?votes
2?Resources?configuredOnline:?[?node1.samlee.com?node2.samlee.com?]webip????(ocf::heartbeat:IPaddr):????Started?node1.samlee.com?webserver????(lsb:httpd):????Started?node2.samlee.com

從上面的信息中可以看出webip和webserver有可能會分別運行于兩個節點上,這對于通過此IP提供Web服務的應用來說是不成立的,即此兩者資源必須同時運行在某節點上,如何實現兩個資源運行在同一個節點上呢?

(1)手工切換資源至其他節點上(在資源自啟動無法滿足--僅用于測試)

crm(live)#?resource?
crm(live)resource#?listwebip????(ocf::heartbeat:IPaddr):????Started?webserver????(lsb:httpd):????Started?
crm(live)resource#?migrate?webserver
crm(live)#?status
Last?updated:?Mon?Aug?15?09:57:34?2016
Last?change:?Mon?Aug?15?09:57:09?2016?via?crm_resource?on?node1.samlee.com
Stack:?classic?openais?(with?plugin)
Current?DC:?node1.samlee.com?-?partition?with?quorum
Version:?1.1.10-14.el6-368c726
2?Nodes?configured,?2?expected?votes
2?Resources?configuredOnline:?[?node1.samlee.com?node2.samlee.com?]webip????(ocf::heartbeat:IPaddr):????Started?node1.samlee.com?webserver????(lsb:httpd):????Started?node1.samlee.com

切換后查看效果如下:

wKiom1exIbOCKsbuAAMIPAJycFg507.gif


(2)建立資源組(將需要在一起啟動的資源規劃在同一個資源組內)

crm(live)#?configure?
crm(live)configure#?group?webservice?webip?webserver
crm(live)configure#?verify?
crm(live)configure#?commit?
crm(live)configure#?cd
crm(live)#?resource?
crm(live)resource#?listResource?Group:?webservicewebip????(ocf::heartbeat:IPaddr):????Started?webserver????(lsb:httpd):????Started?
crm(live)#?status
Last?updated:?Mon?Aug?15?10:06:17?2016
Last?change:?Mon?Aug?15?10:04:33?2016?via?cibadmin?on?node1.samlee.com
Stack:?classic?openais?(with?plugin)
Current?DC:?node1.samlee.com?-?partition?with?quorum
Version:?1.1.10-14.el6-368c726
2?Nodes?configured,?2?expected?votes
2?Resources?configuredOnline:?[?node1.samlee.com?node2.samlee.com?]Resource?Group:?webservicewebip????(ocf::heartbeat:IPaddr):????Started?node1.samlee.com?webserver????(lsb:httpd):????Started?node1.samlee.com

測試效果如下:

wKiom1exIbOCKsbuAAMIPAJycFg507.gif

測試完成后刪除組資源:

crm(live)#?resource?
crm(live)resource#?stop?webservice?
crm(live)resource#?cleanup?webservice?
crm(live)resource#?cd
crm(live)configure#?delete?webservice
crm(live)configure#?verify
crm(live)configure#?commit?
crm(live)#?status?
Last?updated:?Mon?Aug?15?10:31:30?2016
Last?change:?Mon?Aug?15?10:26:21?2016?via?cibadmin?on?node1.samlee.com
Stack:?classic?openais?(with?plugin)
Current?DC:?node1.samlee.com?-?partition?with?quorum
Version:?1.1.10-14.el6-368c726
2?Nodes?configured,?2?expected?votes
2?Resources?configuredOnline:?[?node1.samlee.com?node2.samlee.com?]webip????(ocf::heartbeat:IPaddr):????Started?node1.samlee.com?webserver????(lsb:httpd):????Started?node2.samlee.com##停止資源--清除記錄
#?crm
crm(live)#?resource?
crm(live)resource#?stop?webservice?
crm(live)resource#?list
crm(live)resource#?cleanup?webservice?
crm(live)resource#?cleanup?webip?
crm(live)resource#?cleanup?httpd?
crm(live)resource#?cd
crm(live)#?node
crm(live)node#?clearstate?node1.samlee.com?
crm(live)node#?clearstate?node2.samlee.com?
crm(live)node#?cd
crm(live)#?resource?
crm(live)resource#?start?webservice
crm(live)resource#?reprobe?
crm(live)resource#?refresh?
crm(live)resource#?cd
crm(live)#?configure?
crm(live)configure#?show
crm(live)configure#?edit?
crm(live)configure#?verify?
crm(live)configure#?commit



(3)使用資源約束對資源精細化管理
上面針對資源約束做的案例,即便集群擁有所有必需資源,但它可能還無法進行正確處理。資源約束則用以指定在哪些群集節點上運行資源,以何種順序裝載資源,以及特定資源依賴于哪些其它資源。pacemaker共給我們提供了三種資源約束方法:
1) Resource Location(資源位置約束): 定義資源可以、不可以或盡可能在哪些節點上運行;
2) Resource Collocation(資源排列約束): 排列約束用以定義集群資源可以或不可以在某個節點上同時運行;
3) Resource Order(資源順序約束): 順序約束定義集群資源在節點上啟動的順序;
定義約束時,還需要指定分數。各種分數是集群工作方式的重要組成部分。其實,從遷移資源到決定在已降級集群中停止哪些資源的整個過程是通過以某種方式修改分數來實現的。分數按每個資源來計算,資源分數為負的任何節點都無法運行該資源。在計算出資源分數后,集群選擇分數最高的節點。INFINITY(無窮大)目前定義為 1,000,000。加減無窮大遵循以下3個基本規則:
1)任何值 + 無窮大 = 無窮大
2)任何值 - 無窮大 = -無窮大
3)無窮大 - 無窮大 = -無窮大

定義資源約束時,也可以指定每個約束的分數。分數表示指派給此資源約束的值。分數較高的約束先應用,分數較低的約束后應用。通過使用不同的分數為既定資源創建更多位置約束,可以指定資源要故障轉移至的目標節點的順序。

因此,對于前述的webip和webserver可能會運行于不同節點的問題,通過定義排列約束解決:

crm(live)#?configure?
crm(live)configure#?colocation?webserver_with_webip?inf:?webserver?webip
crm(live)configure#?verify?
crm(live)configure#?commit?
crm(live)configure#?cd
crm(live)#?status?
Last?updated:?Mon?Aug?15?11:03:31?2016
Last?change:?Mon?Aug?15?11:02:47?2016?via?cibadmin?on?node1.samlee.com
Stack:?classic?openais?(with?plugin)
Current?DC:?node1.samlee.com?-?partition?with?quorum
Version:?1.1.10-14.el6-368c726
2?Nodes?configured,?2?expected?votes
2?Resources?configuredOnline:?[?node1.samlee.com?node2.samlee.com?]webip????(ocf::heartbeat:IPaddr):????Started?node1.samlee.com?webserver????(lsb:httpd):????Started?node1.samlee.com

最后看到兩個資源已經運行在同一個節點中,通過資源順序約束定義資源的啟動順序:

##定義先啟動資源webip后再啟動webserver資源
crm(live)configure#?order?webip_before_webserver?mandatory:?webip?webserver?
crm(live)configure#?verify?
crm(live)configure#?commit

查看測試效果:

wKiom1exIbOCKsbuAAMIPAJycFg507.gif

此外,由于HA集群本身并不強制每個節點的性能相同或相近,所以,某些時候我們可能希望在正常時服務總能在某個性能較強的節點上運行,這可以通過位置約束來實現:

crm(live)#?configure?
crm(live)configure#?location?webip_on_node1?webip?200:?node2.samlee.com
crm(live)configure#?verify?
crm(live)configure#?commit

定義資源監控,如果服務停止或重啟我們可以通過資源監控方式來獲知:

crm(live)configure#?primitive?vip?ocf:heartbeat:IPaddr?params?ip=172.16.100.100?op?monitor?interval=30s?timeout=20s

--以上為高可用集群技術之corosync應用詳解(一)所有內容。