MySQL-MHA搭建、故障測試

一、架構說明

MHA（Master High Availability）是一個用于 MySQL 主從復制管理和自動故障轉移的開源工具集。MHA 的主要目的是提供 MySQL 環境的高可用性和自動故障轉移功能，確保在主庫發生故障時能夠快速切換到備庫，降低業務中斷時間。

MHA 主要由以下幾個組件組成：

1.Manager：管理節點，負責監控主庫的狀態，并處理自動故障轉移。Manager 會不斷檢查數據庫的主從同步狀態，當主庫發生故障時，會自動將備庫提升為新的主庫，并重新配置其他從庫的復制關系。
如果主從延遲，則等待所有relay log全部應用后，將VIP漂移到新主庫上，整個故障轉移過程對應用程序完全透明。

2.Node：節點，即 MySQL 實例。MHA 通過在每個 MySQL 實例上部署 MHA 腳本來對節點進行監控和操作。

3.SSH：MHA 使用 SSH 協議來進行節點之間的通信和操作。確保在使用 MHA 之前，通過 SSH 配置了節點之間的互信。

MHA 的工作流程通常如下：

1.Manager 通過 SSH 登錄到節點上監控 MySQL 主庫的狀態，包括主從復制延遲、節點健康狀況等。

2.當發現主庫發生故障或不可用時，Manager 會觸發自動故障轉移流程，將備庫晉升為新的主庫。

3.Manager 會更新其他從庫的復制關系，確保它們能夠正確復制新的主庫。
在這里插入圖片描述

主從復制（一主一從） + 增強半同步 + GTID + MHA（ping_type=INSERT）

架構	操作系統	IP	VIP	主機名	服務版本	端口	磁盤空間	內存	CPU
MHA	redhat7.9	192.168.111.34	192.168.111.36	mha-master01	mysql-8.0.35、manager-0.58、node-0.58	3307	20G	4G	8C
–	–	192.168.111.35	192.168.111.36	mha-slave01	mysql-8.0.35、manager-0.58、node-0.58	3307	20G	4G	8C

注：該方案僅支持一主一從
缺點：
1.manager管理節點存在單點故障
2. io_threads異常或者延遲時，有丟失數據的風險（主要由網絡故障引發，所以主從節點必須在同一個網段)

二、環境準備

2.1 環境準備、搭建主從在此鏈接跳轉

刪除依賴沖突
yum remove mariadb*

注意：搭建MHA需要的依賴包如下（兩個節點都安裝）

perl依賴

yum -y install perl-Module-Install perl-Module-Build

MySQL驅動

yum -y install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager
yum install perl-Class-Load perl-MIME-Lite perl-Mail-Send perl-Mail-Sender perl-Mail-Sendmail perl-Params-Validate perl-Sys-Syslog perl-Email-Date-Format perl-MIME-Types

以上依賴包如果yum源沒有則手動下載安裝，依賴包可在以下網址搜索
perl-DBD-MySQL-4.023-6.el7.x86_64
perl-Config-Tiny-2.14-7.el7.noarch
perl-Log-Dispatch-2.41-1.el7.1.noarch
perl-Parallel-ForkManager-1.18-2.el7.noarch
perl-MIME-Lite-3.030-1.el7.noarch
perl-Mail-Sender-0.8.23-1.el7.noarch
perl-Mail-Sendmail-0.79-21.el7.noarch
perl-Email-Date-Format-1.002-15.el7.noarch
perl-MIME-Types-1.38-2.el7.noarch

若出現libmysqlclient.so.18報錯問題以下是解決方式。

https://rpm.pbone.net/
wget ftp://ftp.pbone.net/mirror/download.fedora.redhat.com/pub/fedora/epel/7/aarch64/Packages/p/perl-Log-Dispatch-2.41-1.el7.1.noarch.rpm
wget ftp://ftp.pbone.net/mirror/archive.fedoraproject.org/epel/7.2020-10-05/x86_64/Packages/p/perl-Parallel-ForkManager-1.18-2.el7.noarch.rpm
wget ftp://ftp.pbone.net/mirror/archive.fedoraproject.org/epel/7.2020-10-05/ppc64/Packages/p/perl-MIME-Lite-3.030-1.el7.noarch.rpm
wget ftp://ftp.pbone.net/mirror/archive.fedoraproject.org/epel/7.2019-05-29/ppc64le/Packages/p/perl-Mail-Sender-0.8.23-1.el7.noarch.rpm
wget ftp://ftp.pbone.net/mirror/download.fedora.redhat.com/pub/fedora/epel/7/aarch64/Packages/p/perl-Mail-Sendmail-0.79-21.el7.noarch.rpm
wget ftp://ftp.pbone.net/mirror/download.fedora.redhat.com/pub/fedora/epel/7/ppc64le/Packages/p/perl-Email-Date-Format-1.002-15.el7.noarch.rpm
wget ftp://ftp.pbone.net/mirror/archive.fedoraproject.org/epel/7.2020-04-20/x86_64/Packages/p/perl-MIME-Types-1.38-2.el7.noarch.rpm安裝mysql驅動
wget ftp://ftp.pbone.net/mirror/dev.mysql.com/pub/Downloads/MySQL-8.0/mysql-community-libs-8.0.28-1.el7.x86_64.rpm
wget ftp://ftp.pbone.net/mirror/dev.mysql.com/pub/Downloads/MySQL-8.0/mysql-community-common-8.0.28-1.el7.x86_64.rpm
wget ftp://ftp.pbone.net/mirror/dev.mysql.com/pub/Downloads/MySQL-8.0/mysql-community-libs-compat-8.0.28-1.el7.x86_64.rpm
wget ftp://ftp.pbone.net/mirror/dev.mysql.com/pub/Downloads/MySQL-8.0/mysql-community-client-plugins-8.0.28-1.el7.x86_64.rpm
wget ftp://ftp.pbone.net/mirror/dev.mysql.com/pub/Downloads/MySQL-5.6/MySQL-shared-5.6.51-1.el7.x86_64.rpm
wget ftp://ftp.pbone.net/mirror/dev.mysql.com/pub/Downloads/MySQL-8.0/mysql-community-client-plugins-8.0.28-1.el7.x86_64.rpmrpm -ivh mysql-community-common-8.0.28-1.el7.x86_64.rpm
rpm -ivh mysql-community-client-plugins-8.0.28-1.el7.x86_64.rpm
rpm -ivh mysql-community-libs-8.0.28-1.el7.x86_64.rpm
rpm -ivh mysql-community-libs-compat-8.0.28-1.el7.x86_64.rpm
rpm -ivh MySQL-shared-5.6.51-1.el7.x86_64.rpm <---> 與mysql-community-libs 互補

2.2 數據庫用戶權限管理

2.2.1 權限管理

在主從環境按照以下建議賦予權限，禁止賦予all privileges權限
普通用戶DDL權限控制

管理員賬號，賬號名統一為admin_root,由項目負責人管理，用于數據初始化等臨時操作

create user 'admin_root'@'%' identified by 'myttrepl@2222#TO'; 
grant alter,Alter routine,Create,Create routine,Create view,Delete,Drop,Select,Insert,Show view,Update on *.* to 'admin_root'@'%';
grant Event,Execute,Index,Lock tables,Process,References,Reload,Show databases,Trigger on *.* to 'admin_root'@'%';

應用賬號權限
#無特殊需求是，默認賦予以下權限

create user 'app_du'@'%' identified by 'myttrepl@2222#TO'; 
grant select,update,delete,insert,Show view,Trigger,Execute,Alter routine on db.* to 'app_du'@'%';
grant event on *.* to 'app_du'@'%';

如果主庫宕機后，能夠快速上線，要避免業務寫從庫，這樣主庫可以以slave身份重新加入，并保持數據同步。
處理方式：不給業務分配超級權限賬號，就可以避免從庫寫入（因為從庫始終處于read_only模式）

2.3 配置互信

方法1：使用ssh-copy命令配置（分別在2個節點執行）

ssh-keygen -t rsa
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.111.34
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.111.35

方法2：手工配置
執行ssh-keygen -t rsa生成一對密鑰
然后將公鑰（id_rsa.pub）分別寫入34、35兩臺機器的/root/.ssh/authorized_keys中

測試是否成功

ssh 192.168.111.34 date  && ssh 192.168.111.35 date

三、安裝MHA

3.1 創建目錄

mkdir -p /data/mha/{scripts,conf,manager}

3.2 配置在線切換、故障切換檢測腳本

vi /data/mha/scripts/master_ip_failover_vip
需要修改的地方
my $vip = '192.168.111.36/24';
my $ssh_start_vip = "/usr/sbin/ifconfig ens32:$key $vip";
my $ssh_stop_vip = "/usr/sbin/ifconfig ens32:$key down";

#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;my ($command,          $ssh_user,        $orig_master_host, $orig_master_ip,$orig_master_port, $new_master_host, $new_master_ip,    $new_master_port
);my $vip = '192.168.111.36/24';
my $key = '1';
my $ssh_start_vip = "/sbin/ifconfig ens32:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens32:$key down";GetOptions('command=s'          => \$command,'ssh_user=s'         => \$ssh_user,'orig_master_host=s' => \$orig_master_host,'orig_master_ip=s'   => \$orig_master_ip,'orig_master_port=i' => \$orig_master_port,'new_master_host=s'  => \$new_master_host,'new_master_ip=s'    => \$new_master_ip,'new_master_port=i'  => \$new_master_port,
);
exit &main();
sub main {print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";if ( $command eq "stop" || $command eq "stopssh" ) {my $exit_code = 1;eval {print "Disabling the VIP on old master: $orig_master_host \n";&stop_vip();$exit_code = 0;};if ($@) {warn "Got Error: $@\n";exit $exit_code;}exit $exit_code;}elsif ( $command eq "start" ) {my $exit_code = 10;eval {print "Enabling the VIP - $vip on the new master - $new_master_host \n";&start_vip();$exit_code = 0;};if ($@) {warn $@;exit $exit_code;}exit $exit_code;}elsif ( $command eq "status" ) {print "Checking the Status of the script.. OK \n";exit 0;}else {&usage();exit 1;}
}
sub start_vip() {`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {return 0  unless  ($ssh_user);`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}sub usage {print"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

vi /data/mha/scripts/master_ip_online_change_vip
需要修改的地方
my $vip = '192.168.111.36/24';
my $ssh_start_vip = "/usr/sbin/ifconfig ens32:$key $vip";
my $ssh_stop_vip = "/usr/sbin/ifconfig ens32:$key down";

#!/usr/bin/env perl
use strict;  
use warnings FATAL =>'all';  
use Getopt::Long;   
my $vip = '192.168.111.36/24';  # Virtual IP  
my $key = "1";  
my $ssh_start_vip = "/sbin/ifconfig ens32:$key $vip";  
my $ssh_stop_vip = "/sbin/ifconfig ens32:$key down";  
my $exit_code = 0;  
my (  $command,              $orig_master_is_new_slave, $orig_master_host,  $orig_master_ip,       $orig_master_port,         $orig_master_user,  $orig_master_password, $orig_master_ssh_user,     $new_master_host,  $new_master_ip,        $new_master_port,          $new_master_user,  $new_master_password,  $new_master_ssh_user,  
);  
GetOptions(  'command=s'                => \$command,  'orig_master_is_new_slave' => \$orig_master_is_new_slave,  'orig_master_host=s'       => \$orig_master_host,  'orig_master_ip=s'         => \$orig_master_ip,  'orig_master_port=i'       => \$orig_master_port,  'orig_master_user=s'       => \$orig_master_user,  'orig_master_password=s'   => \$orig_master_password,  'orig_master_ssh_user=s'   => \$orig_master_ssh_user,  'new_master_host=s'        => \$new_master_host,  'new_master_ip=s'          => \$new_master_ip,  'new_master_port=i'        => \$new_master_port,  'new_master_user=s'        => \$new_master_user,  'new_master_password=s'    => \$new_master_password,  'new_master_ssh_user=s'    => \$new_master_ssh_user,  
);  
exit &main();  
sub main {  
#print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";  if ( $command eq "stop" || $command eq "stopssh" ) {  # $orig_master_host, $orig_master_ip, $orig_master_port are passed.  # If you manage master ip address at global catalog database,  # invalidate orig_master_ip here.  my $exit_code = 1;  eval {  print "\n\n\n***************************************************************\n";  print "Disabling the VIP - $vip on old master: $orig_master_host\n";  print "***************************************************************\n\n\n\n";  
&stop_vip();  $exit_code = 0;  };  if ($@) {  warn "Got Error: $@\n";  exit $exit_code;  }  exit $exit_code;  
}  
elsif ( $command eq "start" ) {  # all arguments are passed.  # If you manage master ip address at global catalog database,  # activate new_master_ip here.  # You can also grant write access (create user, set read_only=0, etc) here.  
my $exit_code = 10;  eval {  print "\n\n\n***************************************************************\n";  print "Enabling the VIP - $vip on new master: $new_master_host \n";  print "***************************************************************\n\n\n\n";  
&start_vip();  $exit_code = 0;  };  if ($@) {  warn $@;  exit $exit_code;  }  exit $exit_code;  
}  
elsif ( $command eq "status" ) {  print "Checking the Status of the script.. OK \n";  `ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_start_vip \"`;  exit 0;  
}  
else {  
&usage();  exit 1;  
}  
}  
# A simple system call that enable the VIP on the new master  
sub start_vip() {  
`ssh $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`;  
}  
# A simple system call that disable the VIP on the old_master  
sub stop_vip() {  
`ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;  
}  
sub usage {  
print  
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";  
}

chmod +x /data/mha/scripts/master_ip_failover_vip
chmod +x /data/mha/scripts/master_ip_online_change_vip

3.3 安裝MHA軟件

mha4mysql-node-0.58.tar.gz —> 兩節點都安裝
mha4mysql-manager-0.58.tar.gz —> 兩節點都安裝

cd /data/mha
wget https://github.com/yoshinorim/mha4mysql-node/releases/download/v0.58/mha4mysql-node-0.58.tar.gz
wget https://github.com/yoshinorim/mha4mysql-manager/releases/download/v0.58/mha4mysql-manager-0.58.tar.gz
tar -zxvf mha4mysql-node-0.58.tar.gz
cd mha4mysql-node-0.58
perl Makefile.PL
make && make install
ln -s /data/mha/mha4mysql-node-0.58/bin/* /usr/local/bin/
ll /usr/local/bin/tar -zxvf mha4mysql-manager-0.58.tar.gz
cd mha4mysql-manager-0.58
perl Makefile.PL
make && make install
ln -s /data/mha/mha4mysql-manager-0.58/bin/* /usr/local/bin/
ll /usr/local/bin/

3.4 在manager節點配置masterha.cnf、app.cnf（兩個節點都配置）

創建mha管理賬號(僅在節點1創建)

create user 'i_mha'@'192.168.%.%' identified by "myttrepl@2222#TO";
GRANT ALL PRIVILEGES ON *.* TO 'i_mha'@'192.168.%.%' with grant option;

mkdir -p /data/mha/masterha/
vi /data/mha/conf/masterha.cnf

[server default]
user=i_mha		#設置監控用戶mha
password=myttrepl@2222#TO		#設置mysql中mha用戶的密碼，這個密碼是前文中創建監控用戶的那個密碼repl_user=repl		#設置復制用戶的用戶repl
repl_password=myttrepl@2222#TO		#設置復制用戶repl的密碼
ssh_user=root		#設置ssh的登錄用戶名
ping_interval=10		#設置監控主庫，發送ping包的時間間隔10秒，默認是3秒，嘗試三次沒有回應的時候自動進行failover
ping_type=CONNECT		
remote_workdir=/data/mha/masterha/		#manager工作目錄

vi /data/mha/conf/app.cnf

manager_workdir=/data/mha/manager/		#manager工作目錄
manager_log=/data/mha/manager/manager.log	#manager日志
#workdir on the node for mysql server
master_binlog_dir=/data/mysql8.0.35/3307/binlog/		#master保存binlog的位置，這里的路徑要與master里配置的binlog的路徑一致，以便MHA能找到
#自動故障切換master腳本
master_ip_failover_script=/data/mha/scripts/master_ip_failover_vip
#手動切換時運行的腳本
master_ip_online_change_script= /data/mha/scripts/master_ip_online_change_vip
#指定檢查的從服務器IP地址
secondary_check_script = /usr/local/bin/masterha_secondary_check -s 192.168.111.34 -s 192.168.111.35	[server1]
hostname=192.168.111.34
ssh_port=22
port=3307
#設置為候選master，設置該參數以后，發生主從切換以后將會將此從庫提升為主庫，即使這個主庫不是集群中最新的slave
candidate_master=1
#默認情況下如果一個slave落后master 超過100M的relay logs的話，MHA將不會選擇該slave作為一個新的master， 因為對于這個slave的恢復需要花費很長時間；通過設置check_repl_delay=0，MHA觸發切換在選擇一個新的master的時候將會忽略復制延時，這個參數對于設置了candidate_master=1的主機非常有用，因為這個候選主在切換的過程中一定是新的master
check_repl_delay=0 #忽略relay logs日志的復制延遲[server2]
hostname=192.168.111.35
ssh_port=22
port=3307
candidate_master=1
check_repl_delay=0

3.5 檢測互信

[root@mhaserver01 conf]# masterha_check_ssh --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf
Thu Feb 22 16:34:13 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Feb 22 16:34:13 2024 - [info] Reading application default configuration from /data/mha/conf/app.cnf..
Thu Feb 22 16:34:13 2024 - [info] Reading server configuration from /data/mha/conf/app.cnf..
Thu Feb 22 16:34:13 2024 - [info] Starting SSH connection tests..
Thu Feb 22 16:34:13 2024 - [debug]
Thu Feb 22 16:34:13 2024 - [debug]  Connecting via SSH from root@192.168.111.34(192.168.111.34:22) to root@192.168.111.35(192.168.111.35:22)..
Thu Feb 22 16:34:13 2024 - [debug]   ok.
Thu Feb 22 16:34:14 2024 - [debug]
Thu Feb 22 16:34:13 2024 - [debug]  Connecting via SSH from root@192.168.111.35(192.168.111.35:22) to root@192.168.111.34(192.168.111.34:22)..
Thu Feb 22 16:34:14 2024 - [debug]   ok.
Thu Feb 22 16:34:14 2024 - [info] All SSH connection tests passed successfully.

3.6 檢測主從狀態

[root@mhaserver01 conf]# masterha_check_repl --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf
Fri Feb 23 10:50:09 2024 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Fri Feb 23 10:50:09 2024 - [info] Reading application default configuration from /data/mha/conf/app.cnf..
Fri Feb 23 10:50:09 2024 - [info] Reading server configuration from /data/mha/conf/app.cnf..
Fri Feb 23 10:50:09 2024 - [info] MHA::MasterMonitor version 0.58.
Fri Feb 23 10:50:10 2024 - [info] GTID failover mode = 1
Fri Feb 23 10:50:10 2024 - [info] Dead Servers:
Fri Feb 23 10:50:10 2024 - [info] Alive Servers:
Fri Feb 23 10:50:10 2024 - [info]   192.168.111.34(192.168.111.34:3307)
Fri Feb 23 10:50:10 2024 - [info]   192.168.111.35(192.168.111.35:3307)
Fri Feb 23 10:50:10 2024 - [info] Alive Slaves:
Fri Feb 23 10:50:10 2024 - [info]   192.168.111.35(192.168.111.35:3307)  Version=8.0.35 (oldest major version between slaves) log-bin:enabled
Fri Feb 23 10:50:10 2024 - [info]     GTID ON
Fri Feb 23 10:50:10 2024 - [info]     Replicating from 192.168.111.34(192.168.111.34:3307)
Fri Feb 23 10:50:10 2024 - [info]     Primary candidate for the new Master (candidate_master is set)
Fri Feb 23 10:50:10 2024 - [info] Current Alive Master: 192.168.111.34(192.168.111.34:3307)
Fri Feb 23 10:50:10 2024 - [info] Checking slave configurations..
Fri Feb 23 10:50:10 2024 - [info]  read_only=1 is not set on slave 192.168.111.35(192.168.111.35:3307).
Fri Feb 23 10:50:10 2024 - [info] Checking replication filtering settings..
Fri Feb 23 10:50:10 2024 - [info]  binlog_do_db= , binlog_ignore_db=
Fri Feb 23 10:50:10 2024 - [info]  Replication filtering check ok.
Fri Feb 23 10:50:10 2024 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Fri Feb 23 10:50:10 2024 - [info] Checking SSH publickey authentication settings on the current master..
Fri Feb 23 10:50:11 2024 - [info] HealthCheck: SSH to 192.168.111.34 is reachable.
Fri Feb 23 10:50:11 2024 - [info]
192.168.111.34(192.168.111.34:3307) (current master)+--192.168.111.35(192.168.111.35:3307)Fri Feb 23 10:50:11 2024 - [info] Checking replication health on 192.168.111.35..
Fri Feb 23 10:50:11 2024 - [info]  ok.
Fri Feb 23 10:50:11 2024 - [info] Checking master_ip_failover_script status:
Fri Feb 23 10:50:11 2024 - [info]   /data/mha/scripts/master_ip_failover_vip --command=status --ssh_user=root --orig_master_host=192.168.111.34 --orig_master_ip=192.168.111.34 --orig_master_port=3307IN SCRIPT TEST====/usr/sbin/ifconfig ens32:1 down==/usr/sbin/ifconfig ens32:1 192.168.111.36/24===Checking the Status of the script.. OK
Fri Feb 23 10:50:11 2024 - [info]  OK.
Fri Feb 23 10:50:11 2024 - [warning] shutdown_script is not defined.
Fri Feb 23 10:50:11 2024 - [info] Got exit code 0 (Not master dead).MySQL Replication Health is OK.

3.7 開啟MHA-manager監控進程（mhamaster02啟用manager管理）

注意：第一次啟動，主庫上的VIP 不會自動綁定，需要手動去綁定，主庫發生故障切換會進行vip的漂移

#起服務
nohup masterha_manager --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf --ignore_last_failover < /dev/null > /data/mha/run.log 2>&1 &
#停服務
masterha_stop --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf

3.8 檢測MHA狀態

masterha_check_status --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf
app (pid:28984) is running(0:PING_OK), master:192.168.111.34##查看MHA日志，看到當前matser是192.168.111.34
cat /data/mha/manager/manager.log |grep "current master"
Mon Feb 26 16:43:12 2024 - [info] Checking SSH publickey authentication settings on the current master..
192.168.111.34(192.168.111.34:3307) (current master)/data/mysql8.0.35/install/mysql-8.0.35-linux-glibc2.12-x86_64/bin/mysql --defaults-file=/data/mysql8.0.35/3307/conf/my.cnf -uroot -h192.168.111.34 -p'y-yxoqupe33Z' -e "show variables like 'server_id'"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------+--------+
| Variable_name | Value  |
+---------------+--------+
| server_id     | 343307 |
+---------------+--------+
/data/mysql8.0.35/install/mysql-8.0.35-linux-glibc2.12-x86_64/bin/mysql --defaults-file=/data/mysql8.0.35/3307/conf/my.cnf -uroot -h192.168.111.35 -p'y-yxoqupe33Z' -e "show variables like 'server_id'"
mysql: [Warning] Using a password on the command line interface can be insecure.
+---------------+--------+
| Variable_name | Value  |
+---------------+--------+
| server_id     | 353307 |
+---------------+--------+

3.9 MHA-VIP 配置

mha-master01節點手動執行vip命令

#掛載vip
/sbin/ifconfig ens32:1 192.168.111.36/24#卸載vip
/usr/sbin/ip addr del 192.168.111.36/24 dev ens32:1

3.10 編寫腳本

3.10.1 登錄數據庫腳本

vi /data/mha/scripts/mha_login_mysql
#!/bin/bash
/data/mysql8.0.35/install/mysql-8.0.35-linux-glibc2.12-x86_64/bin/mysql --defaults-file=/data/mysql8.0.35/3307/conf/my.cnf -uroot -p'y-yxoqupe33Z'ln -s /data/mha/scripts/mha_login_mysql /usr/local/bin/mha_login_mysql

3.10.2 在線切換主從腳本

vi /data/mha/scripts/mha_switchover
#!/bin/bash
masterha_master_switch --master_state=alive --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf --orig_master_is_new_slave --interactive=0 --running_updates_limit=60ln -s /data/mha/scripts/mha_switchover /usr/local/bin/mha_switchover

chmod +x /usr/local/bin/mha_login_mysql
chmod +x /usr/local/bin/mha_switchover

四、故障模擬測試

4.1 在線手工切換（維護切換，需要把MHA監控進程關掉）：

masterha_stop --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnfmasterha_master_switch --master_state=alive --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf --orig_master_is_new_slave --interactive=0 --running_updates_limit=60

–orig_master_is_new_slave：把舊的master配置為從庫
–running_updates_limit=60：如果主從庫同步延遲在60s內都允許切換，但是但是切換的時間長短是由recover時relay 日志的大小決定

切換成功需要看到類似下面的提示：

[info] Switching master to 192.168.111.35(192.168.111.35:3307) completed successfully.

同時要查看VIP是否已經漂移到了新的主庫上面，新的主庫是否可讀寫，最后在新的主庫上起manager服務。

nohup masterha_manager --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf --ignore_last_failover < /dev/null > /data/mha/run.log 2>&1 &masterha_check_status --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf
app (pid:4996) is running(0:PING_OK), master:192.168.111.35

附：MHA在線切換的原理

1. 檢查當前的配置信息及主從服務器的信息
包括讀取MHA的配置文件/data/mha/conf/app.cnf及檢查當前slave的健康狀態2. 阻止對當前master的更新
主要通過如下步驟：2.1.等待1.5s（$time_until_kill_threads*100ms），等待當前連接斷開。2.2 執行 read_only=1，阻止新的DML操作2.3 等待0.5s，等待當前DML操作完成。2.4 kill掉所有連接。2.5 FLUSH NO_WRITE_TO_BINLOG TABLES2.6 FLUSH TABLES WITH READ LOCK3. 等待新master執行完所有的relay log
Waiting to execute all relay logs on 192.168.244.20(192.168.244.20:3306)..4. 將新master的read_only設置為off，并添加VIP5. slave切換到新master上。5.1 等待slave（192.168.244.30）應用完原主從復制產生的relay log，然后執行change master操作切換到新master上。5.2 釋放原master上加的鎖。5.3 因masterha_master_switch命令行中帶有--orig_master_is_new_slave參數，故原master也切換為新master的從。6. 清理新master的相關信息。
主要是執行了reset slave all操作，清除之前的復制信息。

4.2 故障手工切換（MHA進程沒啟動或者掛了的同時主庫也掛了）：

4.2.1 模擬場景

#主庫35停manager監控服務
masterha_stop --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf#主庫35停數據庫服務
/data/mysql8.0.35/install/mysql-8.0.35-linux-glibc2.12-x86_64/bin/mysqladmin --defaults-file=/data/mysql8.0.35/3307/conf/my.cnf -uroot -p shutdown

4.2.2 故障手工切換命令

masterha_master_switch --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf --dead_master_host=192.168.111.35 --dead_master_port=3307 --master_state=dead --new_master_host=192.168.111.34  --new_master_port=3307 --ignore_last_failover

切換成功需要看到類似如下提示：

Started manual(interactive) failover.
Invalidated master IP address on 192.168.111.35(192.168.111.35:3307)
Selected 192.168.111.34(192.168.111.34:3307) as a new master.
192.168.111.34(192.168.111.34:3307): OK: Applying all logs succeeded.
192.168.111.34(192.168.111.34:3307): OK: Activated master IP address.
192.168.111.34(192.168.111.34:3307): Resetting slave info succeeded.
Master failover to 192.168.111.34(192.168.111.34:3307) completed successfully.

表示成功切換，切換成功后，查看VIP是否漂移到了從庫上(切換成功后，MHA進程會自動停止)，同時查看數據庫是否可讀寫。

故障主庫起來后，需要確認數據是否跟新的主庫一樣，如果一樣，那么就把故障主庫作為新的從庫加入新主庫下。然后在主庫上啟動MHA進程。

4.2.3 將故障主庫35數據庫服務拉起，然后加入集群成為從庫。

nohup /data/mysql8.0.35/install/mysql-8.0.35-linux-glibc2.12-x86_64/bin/mysqld_safe --defaults-file=/data/mysql8.0.35/3307/conf/my.cnf &#數據庫服務拉起來后，配置從庫
change master to
master_host='192.168.111.34',
master_port=3307,
master_user='repl',
master_password='myttrepl@2222#TO',
master_auto_position=1;start slave;	//啟動slave進程--查看slave狀態
確認IO線程、SQL線程都以運行
mysql> show slave status\G;	//查看當前的從庫狀態Slave_IO_Running: Yes			//IO線程已運行Slave_SQL_Running: Yes	//SQL線程已運行mysql> show slave status\G;	//查看當前的從庫狀態

4.2.4 主庫啟動manager服務并查看MHA狀態

nohup masterha_manager --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf --ignore_last_failover < /dev/null > /data/mha/run.log 2>&1 &masterha_check_status --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf
app (pid:14837) is running(0:PING_OK), master:192.168.111.34

五、MHA 日常維護命令集：

1 查看ssh 登陸是否成功

masterha_check_ssh --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf

2 查看主從同步情況

masterha_check_repl  --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf

3 檢查啟動的狀態

masterha_check_status  --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf

4 停止mha

masterha_stop --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf

5 啟動mha

nohup masterha_manager --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf --ignore_last_failover < /dev/null > /data/mha/run.log 2>&1 &

注意：當有slave 節點宕掉的情況是啟動不了的，加上–ignore_fail_on_start 即使有節點宕掉也能啟動mha，需要在配置文件中設置ignore_fail=1

6 failover 后下次重啟

每次failover 切換后會在管理目錄生成文件app.failover.complete ，下次在切換的時候會發現有這個文件導致切換不成功，需要手動清理掉。
rm -rf /masterha/app1/app1.failover.complete
也可以加上參數--ignore_last_failover

7 手工failover

手工failover 場景，master 死掉，但是masterha_manager 沒有開啟，可以通過手工failover：
masterha_master_switch --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf --dead_master_host=old_ip --dead_port=port --master_state=dead --new_master_host=new_ip --new_master_port=port --ignore_last_failover

8 masterha_manager 是一種監視和故障轉移的程序。另一方面,masterha_master_switch 程序不監控主庫。masterha_master_switch 可以用于主庫故障轉移,也可用于在線總開關。

9 手動在線切換(比如做維護切換時)

masterha_master_switch ---global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf --master_state=alive --new_master_host=192.168.111.34 --orig_master_is_new_slave
或者
masterha_master_switch --global_conf=/data/mha/conf/masterha.cnf  --conf=/data/mha/conf/app.cnf --master_state=alive --new_master_host=192.168.111.34 -orig_master_is_new_slave --running_updates_limit=10000--orig_master_is_new_slave 切換時加上此參數是將原master 變為slave 節點，如果不加此參數，原來的master 將不啟動
--running_updates_limit=10000 切換時候選master 如果有延遲的話，mha 切換不能成功，加上此參數表示延遲在此時間范圍內都可切換（單位為s），但是切換的時間長短是由recover時relay 日志的大小決定手動在線切換mha，切換時需要將在運行的mha 停掉后才能切換。
在備庫先執行DDL，一般先stop slave，一般不記錄mysql 日志，可以通過set SQL_LOG_BIN =0 實現。然后進行一次主備切換操作，再在原來的主庫上執行			DDL。這種方法適用于增減索引，如果是增加字段就需要額外注意。