MHA(Master HA)是一款开源的MySQL的高可用工具,能在MySQL主从复制的基础上,实现自动化主服务器故障转移。虽然MHA试图从宕机的主服务器上保存二进制日志,但并不是总是可行的。例如,如果主服务器硬件故障或无法通过ssh访问,MHA没法保存二进制日志,只进行故障转移而丢失最新数据。
MHA监控复制架构的主服务器,一旦检测到主服务器故障,就会自动进行故障转移。即使有些从服务器没有收到最新的relay log,MHA自动从最新的从服务器上识别差异的relay log并把这些日志应用到其他从服务器上,因此所有的从服务器保持一致性了。还可以设置优先级指定其中的一台slave作为master的候选,由于MHA在slaves之间修复一致性,因此可以将slave变成新的master,其他的slave都以其作为新master。
目前MHA主要支持一主多从的架构,要搭建MHA,要求一个复制集群中必须最少有三台数据库服务器,一主二从,即一台充当master,一台充当备用master,另外一台充当从库,因为至少需要三台服务器,阿里也在该基础上进行了改造,目前淘宝TMHA已经支持一主一从。
MHA作用:
1)从宕机崩溃的master保存二进制日志事件(binlog events);
2)识别含有最新更新的slave;
3)应用差异的中继日志(relay log)到其他的slave;
4)应用从master保存的二进制日志事件(binlog events);
5)提升一个slave为新的master;
6)使其他的slave连接新的master进行复制;
MHA有两部分组成,MHA Manager(管理节点)和MHA Node(数据节点):
1:MHA Manager可以单独部署在一台独立机器上管理多个master-slave集群,也可以部署在一台slave上。MHA Manager探测集群的node节点,当master出现故障的时它可以自动将具有最新数据的slave提升为新的master,然后将所有其它的slave导向新的master上。整个故障转移过程对应用程序是透明的。
2:MHA node运行在每台MySQL服务器上(master/slave/manager),它通过监控具备解析和清理logs功能的脚本来加快故障转移的。
Manager工具包主要包括以下几个工具:
masterha_check_ssh 检查MHA的SSH配置状况
masterha_check_repl 检查MySQL复制状况
masterha_manger 启动MHA
masterha_check_status 检测当前MHA运行状态
masterha_master_monitor 检测master是否宕机
masterha_master_switch 控制故障转移(自动或者手动)
masterha_conf_host 添加或删除配置的server信息
Node工具包(这些工具通常由MHA Manager的脚本触发,无需人为操作)主要包括以下几个工具:
save_binary_logs 保存和复制master的二进制日志
apply_diff_relay_logs 识别差异的中继日志事件并将其差异的事件应用于其他的slave
filter_mysqlbinlog 去除不必要的ROLLBACK事件(MHA已不再使用这个工具)
purge_relay_logs 清除中继日志(不会阻塞SQL线程)
MHA搭建步骤如下:
1、系统初始化
防火墙和selinux都关闭,修改时区、校验时间
systemctl stop firewall
systemctl stop firewalld
cat /etc/selinux/config
SELINUX=disabled
SELINUXTYPE=targeted
[root@master tmp]# getenforce
Disabled
[root@manager ~]# crontab -l
*/30 * * * * /usr/sbin/ntpdate pool.ntp.org
[root@manager ~]# date -R
Mon, 22 Jan 2018 20:28:29 +0800
MySQL安装参见:Centos7.3编译安装MySQL 5.7.17
2、根据实际场景修改主机名,并加入/etc/hosts中
192.168.121.163 master
192.168.121.165 slave1
192.168.121.166 slave2
192.168.121.169 manager
用ssh-keygen实现四台主机之间相互免密钥登录:
生成密钥,四台机器都依次操作
[master,slave1,slave2,manager]
cd /root
ssh-keygen -t rsa
scp .ssh/id_rsa.pub master:/root/.ssh/master.pub
scp .ssh/id_rsa.pub master:/root/.ssh/slave1.pub
scp .ssh/id_rsa.pub master:/root/.ssh/manager.pub
导入公钥
cat ~/.ssh/*.pub >> ~/.ssh/authorized_keys
scp ~/.ssh/authorized_keys slave1:/root/.ssh/authorized_keys
scp ~/.ssh/authorized_keys slave2:/root/.ssh/authorized_keys
scp ~/.ssh/authorized_keys manager:/root/.ssh/authorized_keys
最后实现直接ssh hostname即可登录主机;
3、安装MHAmha4mysql-node,mha4mysql-manager 软件包:
yum install epel-release -y
yum install perl cpan perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes -y
wget https://downloads.mariadb.com/MHA/mha4mysql-node-0.54-0.el6.noarch.rpm
wget https://downloads.mariadb.com/MHA/mha4mysql-manager-0.55-0.el6.noarch.rpm
rpm -ivh mha4mysql-node-0.54-0.el6.noarch.rpm
rpm -ivh mha4mysql-manager-0.55-0.el6.noarch.rpm
4、实现master,slave1,slave2之间主从复制
参见:MySQL5.7.18基于GTID的主从复制过程实现
5、管理机manager上配置MHA文件
1.创建目录
mkdir -p /masterha/app1
mkdir /etc/masterha
vim /etc/masterha/default.cnf
[server default]
user=root
password=123456
manager_workdir=/masterha/app1
manager_log=/masterha/app1/manager.log
remote_workdir=/masterha/app1
ssh_user=root
repl_user=repl
repl_password=123456
ping_interval=1 #设置监控主库,发送ping包的时间间隔,默认是3秒,尝试三次没有回应的时候自动进行railover
shutdown_script="" //设置故障发生后关闭故障主机脚本(该脚本的主要作用是关闭主机放在发生脑裂,这里没有使用)(可省略)
master_ip_online_change_script="" //设置手动切换时候的切换脚本(可省略)
report_script="/usr/bin/masterha_report_script" //设置发生切换后发送的报警的脚本(可省略)
master_ip_failover_script="/usr/bin/masterha_ip_failover" //设置自动failover时候的切换脚本(可省略)
[server1]
hostname=slave1
master_binlog_dir=/data/mysql/binlog
candidate_master=1 //设置为候选master,如果设置该参数以后,发生主从切换以后将会将此从库提升为主库,即使这个主库不是集群中时间最新的slave
#relay_log_purge=0
[server2]
hostname=master
master_binlog_dir=/data/mysql/binlog
candidate_master=1
[server3]
hostname=slave2
master_binlog_dir=/data/mysql/binlog
no_master=1
6、masterha_check_ssh工具验证ssh信任登录是否成功
[manager]
masterha_check_ssh --conf=/etc/masterha/default.cnf
Mon Jan 22 20:48:59 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jan 22 20:48:59 2018 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Mon Jan 22 20:48:59 2018 - [info] Reading server configurations from /etc/masterha/default.cnf..
Mon Jan 22 20:48:59 2018 - [info] Starting SSH connection tests..
Mon Jan 22 20:49:00 2018 - [debug]
Mon Jan 22 20:48:59 2018 - [debug] Connecting via SSH from root@slave1(192.168.121.165:22) to root@master(192.168.121.163:22)..
Mon Jan 22 20:48:59 2018 - [debug] ok.
Mon Jan 22 20:48:59 2018 - [debug] Connecting via SSH from root@slave1(192.168.121.165:22) to root@slave2(192.168.121.166:22)..
Mon Jan 22 20:49:00 2018 - [debug] ok.
Mon Jan 22 20:49:00 2018 - [debug]
Mon Jan 22 20:48:59 2018 - [debug] Connecting via SSH from root@master(192.168.121.163:22) to root@slave1(192.168.121.165:22)..
Mon Jan 22 20:49:00 2018 - [debug] ok.
Mon Jan 22 20:49:00 2018 - [debug] Connecting via SSH from root@master(192.168.121.163:22) to root@slave2(192.168.121.166:22)..
Mon Jan 22 20:49:00 2018 - [debug] ok.
Mon Jan 22 20:49:01 2018 - [debug]
Mon Jan 22 20:49:00 2018 - [debug] Connecting via SSH from root@slave2(192.168.121.166:22) to root@slave1(192.168.121.165:22)..
Mon Jan 22 20:49:00 2018 - [debug] ok.
Mon Jan 22 20:49:00 2018 - [debug] Connecting via SSH from root@slave2(192.168.121.166:22) to root@master(192.168.121.163:22)..
Mon Jan 22 20:49:01 2018 - [debug] ok.
Mon Jan 22 20:49:01 2018 - [info] All SSH connection tests passed successfully.
7、masterha_check_repl工具验证mysql复制是否成功
[manager]
masterha_check_repl --conf=/etc/masterha/default.cnf
Mon Jan 22 20:50:27 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Jan 22 20:50:28 2018 - [info] Reading application default configurations from /etc/masterha/app1.cnf..
Mon Jan 22 20:50:28 2018 - [info] Reading server configurations from /etc/masterha/default.cnf..
Mon Jan 22 20:50:28 2018 - [info] MHA::MasterMonitor version 0.55.
Mon Jan 22 20:50:29 2018 - [info] Multi-master configuration is detected. Current primary(writable) master is master(192.168.121.163:3306)
Mon Jan 22 20:50:29 2018 - [info] Master configurations are as below:
Master slave1(192.168.121.165:3306), replicating from 192.168.121.163(192.168.121.163:3306), read-only
Master master(192.168.121.163:3306), replicating from 192.168.121.165(192.168.121.165:3306)
Mon Jan 22 20:50:29 2018 - [info] Dead Servers:
Mon Jan 22 20:50:29 2018 - [info] Alive Servers:
Mon Jan 22 20:50:29 2018 - [info] slave1(192.168.121.165:3306)
Mon Jan 22 20:50:29 2018 - [info] master(192.168.121.163:3306)
Mon Jan 22 20:50:29 2018 - [info] slave2(192.168.121.166:3306)
Mon Jan 22 20:50:29 2018 - [info] Alive Slaves:
Mon Jan 22 20:50:29 2018 - [info] slave1(192.168.121.165:3306) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Mon Jan 22 20:50:29 2018 - [info] Replicating from 192.168.121.163(192.168.121.163:3306)
Mon Jan 22 20:50:29 2018 - [info] Primary candidate for the new Master (candidate_master is set)
Mon Jan 22 20:50:29 2018 - [info] slave2(192.168.121.166:3306) Version=5.7.21-log (oldest major version between slaves) log-bin:enabled
Mon Jan 22 20:50:29 2018 - [info] Replicating from 192.168.121.163(192.168.121.163:3306)
Mon Jan 22 20:50:29 2018 - [info] Not candidate for the new Master (no_master is set)
Mon Jan 22 20:50:29 2018 - [info] Current Alive Master: master(192.168.121.163:3306)
Mon Jan 22 20:50:29 2018 - [info] Checking slave configurations..
Mon Jan 22 20:50:29 2018 - [warning] relay_log_purge=0 is not set on slave slave1(192.168.121.165:3306).
Mon Jan 22 20:50:29 2018 - [warning] relay_log_purge=0 is not set on slave slave2(192.168.121.166:3306).
Mon Jan 22 20:50:29 2018 - [info] Checking replication filtering settings..
Mon Jan 22 20:50:29 2018 - [info] binlog_do_db= , binlog_ignore_db=
Mon Jan 22 20:50:29 2018 - [info] Replication filtering check ok.
Mon Jan 22 20:50:29 2018 - [info] Starting SSH connection tests..
Mon Jan 22 20:50:31 2018 - [info] All SSH connection tests passed successfully.
Mon Jan 22 20:50:31 2018 - [info] Checking MHA Node version..
Mon Jan 22 20:50:32 2018 - [info] Version check ok.
Mon Jan 22 20:50:32 2018 - [info] Checking SSH publickey authentication settings on the current master..
Mon Jan 22 20:50:32 2018 - [info] HealthCheck: SSH to master is reachable.
Mon Jan 22 20:50:33 2018 - [info] Master MHA Node version is 0.54.
Mon Jan 22 20:50:33 2018 - [info] Checking recovery script configurations on the current master..
Mon Jan 22 20:50:33 2018 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql/binlog --output_file=/masterha/app1/save_binary_logs_test --manager_version=0.55 --start_file=binlog.000024
Mon Jan 22 20:50:33 2018 - [info] Connecting to root@master(master)..
Creating /masterha/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /data/mysql/binlog, up to binlog.000024
Mon Jan 22 20:50:33 2018 - [info] Master setting check done.
Mon Jan 22 20:50:33 2018 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Mon Jan 22 20:50:33 2018 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=slave1 --slave_ip=192.168.121.165 --slave_port=3306 --workdir=/masterha/app1 --target_version=5.7.21-log --manager_version=0.55 --relay_log_info=/data/mysql/relay-log.info --relay_dir=/data/mysql/ --slave_pass=xxx
Mon Jan 22 20:50:33 2018 - [info] Connecting to root@192.168.121.165(slave1:22)..
Checking slave recovery environment settings..
Opening /data/mysql/relay-log.info ... ok.
Relay log found at /data/mysql/binlog, up to mysql-relay-bin.000002
Temporary relay log file is /data/mysql/binlog/mysql-relay-bin.000002
Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Mon Jan 22 20:50:34 2018 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=slave2 --slave_ip=192.168.121.166 --slave_port=3306 --workdir=/masterha/app1 --target_version=5.7.21-log --manager_version=0.55 --relay_log_info=/data/mysql/relay-log.info --relay_dir=/data/mysql/ --slave_pass=xxx
Mon Jan 22 20:50:34 2018 - [info] Connecting to root@192.168.121.166(slave2:22)..
Checking slave recovery environment settings..
Opening /data/mysql/relay-log.info ... ok.
Relay log found at /data/mysql/binlog, up to mysql-relay-bin.000003
Temporary relay log file is /data/mysql/binlog/mysql-relay-bin.000003
Testing mysql connection and privileges..mysql: [Warning] Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Mon Jan 22 20:50:34 2018 - [info] Slaves settings check done.
Mon Jan 22 20:50:34 2018 - [info]
master (current master)
+--slave1
+--slave2
Mon Jan 22 20:50:34 2018 - [info] Checking replication health on slave1..
Mon Jan 22 20:50:34 2018 - [info] ok.
Mon Jan 22 20:50:34 2018 - [info] Checking replication health on slave2..
Mon Jan 22 20:50:34 2018 - [info] ok.
Mon Jan 22 20:50:34 2018 - [warning] master_ip_failover_script is not defined.
Mon Jan 22 20:50:34 2018 - [warning] shutdown_script is not defined.
Mon Jan 22 20:50:34 2018 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
8、启动MHA manager,并监控日志文件
masterha_manager --conf=/etc/masterha/default.cnf
tail -f /masterha/app1/manager.log
9、测试master(宕机后,是否会自动切换
[master]
[root@master ~]# service mysql stop
Shutting down MySQL..... SUCCESS!
宕掉master后,/masterha/app1/manager.log文件显示:
tail -f /masterha/app1/manager.log
......
Generating relay diff files from the latest slave succeeded.
slave2: OK: Applying all logs succeeded. Slave started, replicating from slave1.
slave1: Resetting slave info succeeded.
Master failover to slave1(192.168.121.165:3306) completed successfully.
上面的结果表明master成功切换。
Tag标签:「集群 mysql 高可用 MHA环境」更新时间:「2021-11-04 13:10:49」阅读次数:「974」
相关博文:
- MySQL9.0最新版单库表数量超过10000时会触发crashing的Bug
- Starting MySQL.Manager of pid-file quit without updating fi[FAILED]报错处理
- MySQL报错“Got fatal error 1236 from master when reading data from binary log...'”处理
- SQLServer2019 Always On集群报错“Connection handshake failed. An OS call failed: (8009030c) 0x8009030c”问题处理
- MySQL批量生成赋权语句