一、架構說明
MHA(Master High Availability)是一個用于 MySQL 主從復制管理和自動故障轉移的開源工具集。MHA 的主要目的是提供 MySQL 環境的高可用性和自動故障轉移功能,確保在主庫發生故障時能夠快速切換到備庫,降低業務中斷時間。
MHA 主要由以下幾個組件組成:
1.Manager:管理節點,負責監控主庫的狀態,并處理自動故障轉移。Manager 會不斷檢查數據庫的主從同步狀態,當主庫發生故障時,會自動將備庫提升為新的主庫,并重新配置其他從庫的復制關系。
如果主從延遲,則等待所有relay log全部應用后,將VIP漂移到新主庫上,整個故障轉移過程對應用程序完全透明。
2.Node:節點,即 MySQL 實例。MHA 通過在每個 MySQL 實例上部署 MHA 腳本來對節點進行監控和操作。
3.SSH:MHA 使用 SSH 協議來進行節點之間的通信和操作。確保在使用 MHA 之前,通過 SSH 配置了節點之間的互信。
MHA 的工作流程通常如下:
1.Manager 通過 SSH 登錄到節點上監控 MySQL 主庫的狀態,包括主從復制延遲、節點健康狀況等。
2.當發現主庫發生故障或不可用時,Manager 會觸發自動故障轉移流程,將備庫晉升為新的主庫。
3.Manager 會更新其他從庫的復制關系,確保它們能夠正確復制新的主庫。
主從復制(一主一從) + 增強半同步 + GTID + MHA(ping_type=INSERT)
架構 | 操作系統 | IP | VIP | 主機名 | 服務版本 | 端口 | 磁盤空間 | 內存 | CPU |
---|---|---|---|---|---|---|---|---|---|
MHA | redhat7.9 | 192.168.111.34 | 192.168.111.36 | mha-master01 | mysql-8.0.35、manager-0.58、node-0.58 | 3307 | 20G | 4G | 8C |
– | – | 192.168.111.35 | 192.168.111.36 | mha-slave01 | mysql-8.0.35、manager-0.58、node-0.58 | 3307 | 20G | 4G | 8C |
注:該方案僅支持一主一從
缺點:
1.manager管理節點存在單點故障
2. io_threads異常或者延遲時,有丟失數據的風險(主要由網絡故障引發,所以主從節點必須在同一個網段)
二、環境準備
2.1 環境準備、搭建主從在此鏈接跳轉
刪除依賴沖突
yum remove mariadb*
注意:搭建MHA需要的依賴包如下(兩個節點都安裝)
perl依賴
yum -y install perl-Module-Install perl-Module-Build
MySQL驅動
yum -y install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager
yum install perl-Class-Load perl-MIME-Lite perl-Mail-Send perl-Mail-Sender perl-Mail-Sendmail perl-Params-Validate perl-Sys-Syslog perl-Email-Date-Format perl-MIME-Types
以上依賴包如果yum源沒有則手動下載安裝,依賴包可在以下網址搜索
perl-DBD-MySQL-4.023-6.el7.x86_64
perl-Config-Tiny-2.14-7.el7.noarch
perl-Log-Dispatch-2.41-1.el7.1.noarch
perl-Parallel-ForkManager-1.18-2.el7.noarch
perl-MIME-Lite-3.030-1.el7.noarch
perl-Mail-Sender-0.8.23-1.el7.noarch
perl-Mail-Sendmail-0.79-21.el7.noarch
perl-Email-Date-Format-1.002-15.el7.noarch
perl-MIME-Types-1.38-2.el7.noarch
若出現libmysqlclient.so.18報錯問題以下是解決方式。
https://rpm.pbone.net/
wget ftp://ftp.pbone.net/mirror/download.fedora.redhat.com/pub/fedora/epel/7/aarch64/Packages/p/perl-Log-Dispatch-2.41-1.el7.1.noarch.rpm
wget ftp://ftp.pbone.net/mirror/archive.fedoraproject.org/epel/7.2020-10-05/x86_64/Packages/p/perl-Parallel-ForkManager-1.18-2.el7.noarch.rpm
wget ftp://ftp.pbone.net/mirror/archive.fedoraproject.org/epel/7.2020-10-05/ppc64/Packages/p/perl-MIME-Lite-3.030-1.el7.noarch.rpm
wget ftp://ftp.pbone.net/mirror/archive.fedoraproject.org/epel/7.2019-05-29/ppc64le/Packages/p/perl-Mail-Sender-0.8.23-1.el7.noarch.rpm
wget ftp://ftp.pbone.net/mirror/download.fedora.redhat.com/pub/fedora/epel/7/aarch64/Packages/p/perl-Mail-Sendmail-0.79-21.el7.noarch.rpm
wget ftp://ftp.pbone.net/mirror/download.fedora.redhat.com/pub/fedora/epel/7/ppc64le/Packages/p/perl-Email-Date-Format-1.002-15.el7.noarch.rpm
wget ftp://ftp.pbone.net/mirror/archive.fedoraproject.org/epel/7.2020-04-20/x86_64/Packages/p/perl-MIME-Types-1.38-2.el7.noarch.rpm安裝mysql驅動
wget ftp://ftp.pbone.net/mirror/dev.mysql.com/pub/Downloads/MySQL-8.0/mysql-community-libs-8.0.28-1.el7.x86_64.rpm
wget ftp://ftp.pbone.net/mirror/dev.mysql.com/pub/Downloads/MySQL-8.0/mysql-community-common-8.0.28-1.el7.x86_64.rpm
wget ftp://ftp.pbone.net/mirror/dev.mysql.com/pub/Downloads/MySQL-8.0/mysql-community-libs-compat-8.0.28-1.el7.x86_64.rpm
wget ftp://ftp.pbone.net/mirror/dev.mysql.com/pub/Downloads/MySQL-8.0/mysql-community-client-plugins-8.0.28-1.el7.x86_64.rpm
wget ftp://ftp.pbone.net/mirror/dev.mysql.com/pub/Downloads/MySQL-5.6/MySQL-shared-5.6.51-1.el7.x86_64.rpm
wget ftp://ftp.pbone.net/mirror/dev.mysql.com/pub/Downloads/MySQL-8.0/mysql-community-client-plugins-8.0.28-1.el7.x86_64.rpmrpm -ivh mysql-community-common-8.0.28-1.el7.x86_64.rpm
rpm -ivh mysql-community-client-plugins-8.0.28-1.el7.x86_64.rpm
rpm -ivh mysql-community-libs-8.0.28-1.el7.x86_64.rpm
rpm -ivh mysql-community-libs-compat-8.0.28-1.el7.x86_64.rpm
rpm -ivh MySQL-shared-5.6.51-1.el7.x86_64.rpm <---> 與mysql-community-libs 互補
2.2 數據庫用戶權限管理
2.2.1 權限管理
在主從環境按照以下建議賦予權限,禁止賦予all privileges權限
普通用戶DDL權限控制
管理員賬號,賬號名統一為admin_root,由項目負責人管理,用于數據初始化等臨時操作
create user 'admin_root'@'%' identified by 'myttrepl@2222#TO';
grant alter,Alter routine,Create,Create routine,Create view,Delete,Drop,Select,Insert,Show view,Update on *.* to 'admin_root'@'%';
grant Event,Execute,Index,Lock tables,Process,References,Reload,Show databases,Trigger on *.* to 'admin_root'@'%';
應用賬號權限
#無特殊需求是,默認賦予以下權限
create user 'app_du'@'%' identified by 'myttrepl@2222#TO';
grant select,update,delete,insert,Show view,Trigger,Execute,Alter routine on db.* to 'app_du'@'%';
grant event on *.* to 'app_du'@'%';
如果主庫宕機后,能夠快速上線,要避免業務寫從庫,這樣主庫可以以slave身份重新加入,并保持數據同步。
處理方式:不給業務分配超級權限賬號,就可以避免從庫寫入(因為從庫始終處于read_only模式)
2.3 配置互信
方法1:使用ssh-copy命令配置(分別在2個節點執行)
ssh-keygen -t rsa
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.111.34
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.111.35
方法2:手工配置
執行ssh-keygen -t rsa生成一對密鑰
然后將公鑰(id_rsa.pub)分別寫入34、35兩臺機器的/root/.ssh/authorized_keys中
測試是否成功
ssh 192.168.111.34 date && ssh 192.168.111.35 date
三、安裝MHA
3.1 創建目錄
mkdir -p /data/mha/{scripts,conf,manager}
3.2 配置在線切換、故障切換檢測腳本
vi /data/mha/scripts/master_ip_failover_vip
需要修改的地方
my $vip = '192.168.111.36/24';
my $ssh_start_vip = "/usr/sbin/ifconfig ens32:$key $vip";
my $ssh_stop_vip = "/usr/sbin/ifconfig ens32:$key down";
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;my ($command, $ssh_user, $orig_master_host, $orig_master_ip,$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);my $vip = '192.168.111.36/24';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig ens32:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens32:$key down";GetOptions('command=s' => \$command,'ssh_user=s' => \$ssh_user,'orig_master_host=s' => \$orig_master_host,'orig_master_ip=s' => \$orig_master_ip,'orig_master_port=i' => \$orig_master_port,'new_master_host=s' => \$new_master_host,'new_master_ip=s' => \$new_master_ip,'new_master_port=i' => \$new_master_port,
);
exit &main();
sub main {print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";if ( $command eq "stop" || $command eq "stopssh" ) {my $exit_code = 1;eval {print "Disabling the VIP on old master: $orig_master_host \n";&stop_vip();$exit_code = 0;};if ($@) {warn "Got Error: $@\n";exit $exit_code;}exit $exit_code;}elsif ( $command eq "start" ) {my $exit_code = 10;eval {print "Enabling the VIP - $vip on the new master - $new_master_host \n";&start_vip();$exit_code = 0;};if ($@) {warn $@;exit $exit_code;}exit $exit_code;}elsif ( $command eq "status" ) {print "Checking the Status of the script.. OK \n";exit 0;}else {&usage();exit 1;}
}
sub start_vip() {`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {return 0 unless ($ssh_user);`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}sub usage {print"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
vi /data/mha/scripts/master_ip_online_change_vip
需要修改的地方
my $vip = '192.168.111.36/24';
my $ssh_start_vip = "/usr/sbin/ifconfig ens32:$key $vip";
my $ssh_stop_vip = "/usr/sbin/ifconfig ens32:$key down";
#!/usr/bin/env perl
use strict;
use warnings FATAL =>'all';
use Getopt::Long;
my $vip = '192.168.111.36/24'; # Virtual IP
my $key = "1";
my $ssh_start_vip = "/sbin/ifconfig ens32:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens32:$key down";
my $exit_code = 0;
my ( $command, $orig_master_is_new_slave, $orig_master_host, $orig_master_ip, $orig_master_port, $orig_master_user, $orig_master_password, $orig_master_ssh_user, $new_master_host, $new_master_ip, $new_master_port, $new_master_user, $new_master_password, $new_master_ssh_user,
);
GetOptions( 'command=s' => \$command, 'orig_master_is_new_slave' => \$orig_master_is_new_slave, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'orig_master_user=s' => \$orig_master_user, 'orig_master_password=s' => \$orig_master_password, 'orig_master_ssh_user=s' => \$orig_master_ssh_user, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, 'new_master_user=s' => \$new_master_user, 'new_master_password=s' => \$new_master_password, 'new_master_ssh_user=s' => \$new_master_ssh_user,
);
exit &main();
sub main {
#print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n"; if ( $command eq "stop" || $command eq "stopssh" ) { # $orig_master_host, $orig_master_ip, $orig_master_port are passed. # If you manage master ip address at global catalog database, # invalidate orig_master_ip here. my $exit_code = 1; eval { print "\n\n\n***************************************************************\n"; print "Disabling the VIP - $vip on old master: $orig_master_host\n"; print "***************************************************************\n\n\n\n";
&stop_vip(); $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code;
}
elsif ( $command eq "start" ) { # all arguments are passed. # If you manage master ip address at global catalog database, # activate new_master_ip here. # You can also grant write access (create user, set read_only=0, etc) here.
my $exit_code = 10; eval { print "\n\n\n***************************************************************\n"; print "Enabling the VIP - $vip on new master: $new_master_host \n"; print "***************************************************************\n\n\n\n";
&start_vip(); $exit_code = 0; }; if ($@) { warn $@; exit $exit_code; } exit $exit_code;
}
elsif ( $command eq "status" ) { print "Checking the Status of the script.. OK \n"; `ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_start_vip \"`; exit 0;
}
else {
&usage(); exit 1;
}
}
# A simple system call that enable the VIP on the new master
sub start_vip() {
`ssh $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
chmod +x /data/mha/scripts/master_ip_failover_vip
chmod +x /data/mha/scripts/master_ip_online_change_vip
3.3 安裝MHA軟件
mha4mysql-node-0.58.tar.gz —> 兩節點都安裝
mha4mysql-manager-0.58.tar.gz —> 兩節點都安裝
cd /data/mha
wget https://github.com/yoshinorim/mha4mysql-node/releases/download/v0.58/mha4mysql-node-0.58.tar.gz
wget https://github.com/yoshinorim/mha4mysql-manager/releases/download/v0.58/mha4mysql-manager-0.58.tar.gz
tar -zxvf mha4mysql-node-0.58.tar.gz
cd mha4mysql-node-0.58
perl Makefile.PL
make && make install
ln -s /data/mha/mha4mysql-node-0.58/bin/* /usr/local/bin/
ll /usr/local/bin/tar -zxvf mha4mysql-manager-0.58.tar.gz
cd mha4mysql-manager-0.58
perl Makefile.PL
make && make install
ln -s /data/mha/mha4mysql-manager-0.58/bin/* /usr/local/bin/
ll /usr/local/bin/
3.4 在manager節點配置masterha.cnf、app.cnf(兩個節點都配置)
創建mha管理賬號(僅在節點1創建)
create user 'i_mha'@'192.168.%.%' identified by "myttrepl@2222#TO";
GRANT ALL PRIVILEGES ON *.* TO 'i_mha'@'192.168.%.%' with grant option;
mkdir -p /data/mha/masterha/
vi /data/mha/conf/masterha.cnf
[server default]
user=i_mha #設置監控用戶mha
password=myttrepl@2222#TO #設置mysql中mha用戶的密碼,這個密碼是前文中創建監控用戶的那個密碼repl_user=repl #設置復制用戶的用戶repl
repl_password=myttrepl@2222#TO #設置復制用戶repl的密碼
ssh_user=root #設置ssh的登錄用戶名
ping_interval=10 #設置監控主庫,發送ping包的時間間隔10秒,默認是3秒,嘗試三次沒有回應的時候自動進行failover
ping_type=CONNECT
remote_workdir=/data/mha/masterha/ #manager工作目錄
vi /data/mha/conf/app.cnf
manager_workdir=/data/mha/manager/ #manager工作目錄
manager_log=/data/mha/manager/manager.log #manager日志
#workdir on the node for mysql server
master_binlog_dir=/data/mysql8.0.35/3307/binlog/ #master保存binlog的位置,這里的路徑要與master里配置的binlog的路徑一致,以便MHA能找到
#自動故障切換master腳本
master_ip_failover_script=/data/mha/scripts/master_ip_failover_vip
#手動切換時運行的腳本
master_ip_online_change_script= /data/mha/scripts/master_ip_online_change_vip
#指定檢查的從服務器IP地址
secondary_check_script = /usr/local/bin/masterha_secondary_check -s 192.168.111.34 -s 192.168.111.35 [server1]
hostname=192.168.111.34
ssh_port=22
port=3307
#設置為候選master,設置該參數以后,發生主從切換以后將會將此從庫提升為主庫,即使這個主庫不是集群中最新的slave
candidate_master=1
#默認情況下如果一個slave落后master 超過100M的relay logs的話,MHA將不會選擇該slave作為一個新的master, 因為對于這個slave的恢復需要花費很長時間;通過設置check_repl_delay=0,MHA觸發切換在選擇一個新的master的時候將會忽略復制延時,這個參數對于設置了candidate_master=1的主機非常有用,因為這個候選主在切換的過程中一定是新的master
check_repl_delay=0 #忽略relay logs日志的復制延遲[server2]
hostname=192.168.111.35
ssh_port=22
port=3307
candidate_master=1
check_repl_delay=0
3.5 檢測互信
[root@mhaserver01 conf]# masterha_check_ssh --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf
Thu Feb 22 16:34:13 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Feb 22 16:34:13 2024 - [info] Reading application default configuration from /data/mha/conf/app.cnf..
Thu Feb 22 16:34:13 2024 - [info] Reading server configuration from /data/mha/conf/app.cnf..
Thu Feb 22 16:34:13 2024 - [info] Starting SSH connection tests..
Thu Feb 22 16:34:13 2024 - [debug]
Thu Feb 22 16:34:13 2024 - [debug] Connecting via SSH from root@192.168.111.34(192.168.111.34:22) to root@192.168.111.35(192.168.111.35:22)..
Thu Feb 22 16:34:13 2024 - [debug] ok.
Thu Feb 22 16:34:14 2024 - [debug]
Thu Feb 22 16:34:13 2024 - [debug] Connecting via SSH from root@192.168.111.35(192.168.111.35:22) to root@192.168.111.34(192.168.111.34:22)..
Thu Feb 22 16:34:14 2024 - [debug] ok.
Thu Feb 22 16:34:14 2024 - [info] All SSH connection tests passed successfully.
3.6 檢測主從狀態
[root@mhaserver01 conf]# masterha_check_repl --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf
Fri Feb 23 10:50:09 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Feb 23 10:50:09 2024 - [info] Reading application default configuration from /data/mha/conf/app.cnf..
Fri Feb 23 10:50:09 2024 - [info] Reading server configuration from /data/mha/conf/app.cnf..
Fri Feb 23 10:50:09 2024 - [info] MHA::MasterMonitor version 0.58.
Fri Feb 23 10:50:10 2024 - [info] GTID failover mode = 1
Fri Feb 23 10:50:10 2024 - [info] Dead Servers:
Fri Feb 23 10:50:10 2024 - [info] Alive Servers:
Fri Feb 23 10:50:10 2024 - [info] 192.168.111.34(192.168.111.34:3307)
Fri Feb 23 10:50:10 2024 - [info] 192.168.111.35(192.168.111.35:3307)
Fri Feb 23 10:50:10 2024 - [info] Alive Slaves:
Fri Feb 23 10:50:10 2024 - [info] 192.168.111.35(192.168.111.35:3307) Version=8.0.35 (oldest major version between slaves) log-bin:enabled
Fri Feb 23 10:50:10 2024 - [info] GTID ON
Fri Feb 23 10:50:10 2024 - [info] Replicating from 192.168.111.34(192.168.111.34:3307)
Fri Feb 23 10:50:10 2024 - [info] Primary candidate for the new Master (candidate_master is set)
Fri Feb 23 10:50:10 2024 - [info] Current Alive Master: 192.168.111.34(192.168.111.34:3307)
Fri Feb 23 10:50:10 2024 - [info] Checking slave configurations..
Fri Feb 23 10:50:10 2024 - [info] read_only=1 is not set on slave 192.168.111.35(192.168.111.35:3307).
Fri Feb 23 10:50:10 2024 - [info] Checking replication filtering settings..
Fri Feb 23 10:50:10 2024 - [info] binlog_do_db= , binlog_ignore_db=
Fri Feb 23 10:50:10 2024 - [info] Replication filtering check ok.
Fri Feb 23 10:50:10 2024 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri Feb 23 10:50:10 2024 - [info] Checking SSH publickey authentication settings on the current master..
Fri Feb 23 10:50:11 2024 - [info] HealthCheck: SSH to 192.168.111.34 is reachable.
Fri Feb 23 10:50:11 2024 - [info]
192.168.111.34(192.168.111.34:3307) (current master)+--192.168.111.35(192.168.111.35:3307)Fri Feb 23 10:50:11 2024 - [info] Checking replication health on 192.168.111.35..
Fri Feb 23 10:50:11 2024 - [info] ok.
Fri Feb 23 10:50:11 2024 - [info] Checking master_ip_failover_script status:
Fri Feb 23 10:50:11 2024 - [info] /data/mha/scripts/master_ip_failover_vip --command=status --ssh_user=root --orig_master_host=192.168.111.34 --orig_master_ip=192.168.111.34 --orig_master_port=3307IN SCRIPT TEST====/usr/sbin/ifconfig ens32:1 down==/usr/sbin/ifconfig ens32:1 192.168.111.36/24===Checking the Status of the script.. OK
Fri Feb 23 10:50:11 2024 - [info] OK.
Fri Feb 23 10:50:11 2024 - [warning] shutdown_script is not defined.
Fri Feb 23 10:50:11 2024 - [info] Got exit code 0 (Not master dead).MySQL Replication Health is OK.
3.7 開啟MHA-manager監控進程(mhamaster02啟用manager管理)
注意:第一次啟動,主庫上的VIP 不會自動綁定,需要手動去綁定,主庫發生故障切換會進行vip的漂移
#起服務
nohup masterha_manager --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf --ignore_last_failover < /dev/null > /data/mha/run.log 2>&1 &
#停服務
masterha_stop --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf
3.8 檢測MHA狀態
masterha_check_status --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf
app (pid:28984) is running(0:PING_OK), master:192.168.111.34##查看MHA日志,看到當前matser是192.168.111.34
cat /data/mha/manager/manager.log |grep "current master"
Mon Feb 26 16:43:12 2024 - [info] Checking SSH publickey authentication settings on the current master..
192.168.111.34(192.168.111.34:3307) (current master)/data/mysql8.0.35/install/mysql-8.0.35-linux-glibc2.12-x86_64/bin/mysql --defaults-file=/data/mysql8.0.35/3307/conf/my.cnf -uroot -h192.168.111.34 -p'y-yxoqupe33Z' -e "show variables like 'server_id'"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------+--------+
| Variable_name | Value |
+---------------+--------+
| server_id | 343307 |
+---------------+--------+
/data/mysql8.0.35/install/mysql-8.0.35-linux-glibc2.12-x86_64/bin/mysql --defaults-file=/data/mysql8.0.35/3307/conf/my.cnf -uroot -h192.168.111.35 -p'y-yxoqupe33Z' -e "show variables like 'server_id'"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------+--------+
| Variable_name | Value |
+---------------+--------+
| server_id | 353307 |
+---------------+--------+
3.9 MHA-VIP 配置
mha-master01節點手動執行vip命令
#掛載vip
/sbin/ifconfig ens32:1 192.168.111.36/24#卸載vip
/usr/sbin/ip addr del 192.168.111.36/24 dev ens32:1
3.10 編寫腳本
3.10.1 登錄數據庫腳本
vi /data/mha/scripts/mha_login_mysql
#!/bin/bash
/data/mysql8.0.35/install/mysql-8.0.35-linux-glibc2.12-x86_64/bin/mysql --defaults-file=/data/mysql8.0.35/3307/conf/my.cnf -uroot -p'y-yxoqupe33Z'ln -s /data/mha/scripts/mha_login_mysql /usr/local/bin/mha_login_mysql
3.10.2 在線切換主從腳本
vi /data/mha/scripts/mha_switchover
#!/bin/bash
masterha_master_switch --master_state=alive --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf --orig_master_is_new_slave --interactive=0 --running_updates_limit=60ln -s /data/mha/scripts/mha_switchover /usr/local/bin/mha_switchover
chmod +x /usr/local/bin/mha_login_mysql
chmod +x /usr/local/bin/mha_switchover
四、故障模擬測試
4.1 在線手工切換(維護切換,需要把MHA監控進程關掉):
masterha_stop --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnfmasterha_master_switch --master_state=alive --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf --orig_master_is_new_slave --interactive=0 --running_updates_limit=60
–orig_master_is_new_slave:把舊的master配置為從庫
–running_updates_limit=60:如果主從庫同步延遲在60s內都允許切換,但是但是切換的時間長短是由recover時relay 日志的大小決定
切換成功需要看到類似下面的提示:
[info] Switching master to 192.168.111.35(192.168.111.35:3307) completed successfully.
同時要查看VIP是否已經漂移到了新的主庫上面,新的主庫是否可讀寫,最后在新的主庫上起manager服務。
nohup masterha_manager --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf --ignore_last_failover < /dev/null > /data/mha/run.log 2>&1 &masterha_check_status --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf
app (pid:4996) is running(0:PING_OK), master:192.168.111.35
附:MHA在線切換的原理
1. 檢查當前的配置信息及主從服務器的信息
包括讀取MHA的配置文件/data/mha/conf/app.cnf及檢查當前slave的健康狀態2. 阻止對當前master的更新
主要通過如下步驟:2.1.等待1.5s($time_until_kill_threads*100ms),等待當前連接斷開。2.2 執行 read_only=1,阻止新的DML操作2.3 等待0.5s,等待當前DML操作完成。2.4 kill掉所有連接。2.5 FLUSH NO_WRITE_TO_BINLOG TABLES2.6 FLUSH TABLES WITH READ LOCK3. 等待新master執行完所有的relay log
Waiting to execute all relay logs on 192.168.244.20(192.168.244.20:3306)..4. 將新master的read_only設置為off,并添加VIP5. slave切換到新master上。5.1 等待slave(192.168.244.30)應用完原主從復制產生的relay log,然后執行change master操作切換到新master上。5.2 釋放原master上加的鎖。5.3 因masterha_master_switch命令行中帶有--orig_master_is_new_slave參數,故原master也切換為新master的從。6. 清理新master的相關信息。
主要是執行了reset slave all操作,清除之前的復制信息。
4.2 故障手工切換(MHA進程沒啟動或者掛了的同時主庫也掛了):
4.2.1 模擬場景
#主庫35停manager監控服務
masterha_stop --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf#主庫35停數據庫服務
/data/mysql8.0.35/install/mysql-8.0.35-linux-glibc2.12-x86_64/bin/mysqladmin --defaults-file=/data/mysql8.0.35/3307/conf/my.cnf -uroot -p shutdown
4.2.2 故障手工切換命令
masterha_master_switch --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf --dead_master_host=192.168.111.35 --dead_master_port=3307 --master_state=dead --new_master_host=192.168.111.34 --new_master_port=3307 --ignore_last_failover
切換成功需要看到類似如下提示:
Started manual(interactive) failover.
Invalidated master IP address on 192.168.111.35(192.168.111.35:3307)
Selected 192.168.111.34(192.168.111.34:3307) as a new master.
192.168.111.34(192.168.111.34:3307): OK: Applying all logs succeeded.
192.168.111.34(192.168.111.34:3307): OK: Activated master IP address.
192.168.111.34(192.168.111.34:3307): Resetting slave info succeeded.
Master failover to 192.168.111.34(192.168.111.34:3307) completed successfully.
表示成功切換,切換成功后,查看VIP是否漂移到了從庫上(切換成功后,MHA進程會自動停止),同時查看數據庫是否可讀寫。
故障主庫起來后,需要確認數據是否跟新的主庫一樣,如果一樣,那么就把故障主庫作為新的從庫加入新主庫下。然后在主庫上啟動MHA進程。
4.2.3 將故障主庫35數據庫服務拉起,然后加入集群成為從庫。
nohup /data/mysql8.0.35/install/mysql-8.0.35-linux-glibc2.12-x86_64/bin/mysqld_safe --defaults-file=/data/mysql8.0.35/3307/conf/my.cnf &#數據庫服務拉起來后,配置從庫
change master to
master_host='192.168.111.34',
master_port=3307,
master_user='repl',
master_password='myttrepl@2222#TO',
master_auto_position=1;start slave; //啟動slave進程--查看slave狀態
確認IO線程、SQL線程都以運行
mysql> show slave status\G; //查看當前的從庫狀態Slave_IO_Running: Yes //IO線程已運行Slave_SQL_Running: Yes //SQL線程已運行mysql> show slave status\G; //查看當前的從庫狀態
4.2.4 主庫啟動manager服務并查看MHA狀態
nohup masterha_manager --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf --ignore_last_failover < /dev/null > /data/mha/run.log 2>&1 &masterha_check_status --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf
app (pid:14837) is running(0:PING_OK), master:192.168.111.34
五、MHA 日常維護命令集:
1 查看ssh 登陸是否成功
masterha_check_ssh --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf
2 查看主從同步情況
masterha_check_repl --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf
3 檢查啟動的狀態
masterha_check_status --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf
4 停止mha
masterha_stop --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf
5 啟動mha
nohup masterha_manager --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf --ignore_last_failover < /dev/null > /data/mha/run.log 2>&1 &
注意:當有slave 節點宕掉的情況是啟動不了的,加上–ignore_fail_on_start 即使有節點宕掉也能啟動mha,需要在配置文件中設置ignore_fail=1
6 failover 后下次重啟
每次failover 切換后會在管理目錄生成文件app.failover.complete ,下次在切換的時候會發現有這個文件導致切換不成功,需要手動清理掉。
rm -rf /masterha/app1/app1.failover.complete
也可以加上參數--ignore_last_failover
7 手工failover
手工failover 場景,master 死掉,但是masterha_manager 沒有開啟,可以通過手工failover:
masterha_master_switch --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf --dead_master_host=old_ip --dead_port=port --master_state=dead --new_master_host=new_ip --new_master_port=port --ignore_last_failover
8 masterha_manager 是一種監視和故障轉移的程序。另一方面,masterha_master_switch 程序不監控主庫。masterha_master_switch 可以用于主庫故障轉移,也可用于在線總開關。
9 手動在線切換(比如做維護切換時)
masterha_master_switch ---global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf --master_state=alive --new_master_host=192.168.111.34 --orig_master_is_new_slave
或者
masterha_master_switch --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf --master_state=alive --new_master_host=192.168.111.34 -orig_master_is_new_slave --running_updates_limit=10000--orig_master_is_new_slave 切換時加上此參數是將原master 變為slave 節點,如果不加此參數,原來的master 將不啟動
--running_updates_limit=10000 切換時候選master 如果有延遲的話,mha 切換不能成功,加上此參數表示延遲在此時間范圍內都可切換(單位為s),但是切換的時間長短是由recover時relay 日志的大小決定手動在線切換mha,切換時需要將在運行的mha 停掉后才能切換。
在備庫先執行DDL,一般先stop slave,一般不記錄mysql 日志,可以通過set SQL_LOG_BIN =0 實現。然后進行一次主備切換操作,再在原來的主庫上執行 DDL。這種方法適用于增減索引,如果是增加字段就需要額外注意。