Mysql MHA高可用集群搭建

发表于 2020-10-13 更新于 2020-10-15 分类于数据库，运维 Waline：

本文记录一下Mysql高可用集群搭建过程。4台主机，1主、2从、1 MHA，实现一主两从的半同步复制，使用MHA实现主机出故障，从库能自动切换

本文记录一下Mysql高可用集群搭建过程。4台主机，1主、2从、1 MHA，实现一主两从的半同步复制，使用MHA实现主机出故障，从库能自动切换

1.环境说明

1.1 操作系统和软件版本

操作系统：CentOS7.8

Mysql：5.7.28

mha版本：mha4mysql-node-0.58-0.el7.centos.noarch.rpm mha4mysql-manager-0.58-0.el7.centos.noarch.rpm

1.2 主机角色介绍

Master：192.168.1.161 mysql主节点

Slave1：192.168.1.162 mysql从节点1

Slave2：192.168.1.163 mysql从节点2

mha：192.168.1.164 mha manager节点

2.一主双从半同步复制

2.1 master节点配置文件修改

192.168.1.161节点上执行：

[root@master ~]# vi /etc/my.cnf
# For advice on how to change settings please see
# http://dev.mysql.com/doc/refman/5.7/en/server-configuration-defaults.html

[mysqld]
#
# Remove leading # and set to the amount of RAM for the most important data
# cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%.
# innodb_buffer_pool_size = 128M
#
# Remove leading # to turn on a very important data integrity option: logging
# changes to the binary log between backups.
# log_bin
#myslq服务器id
server-id=1
#开启binlog
log-bin=mysql-bin
#开启半同步复制
rpl_semi_sync_master_enabled=1
#跳过名称反向解析
skip_name_resolve=1
#
# Remove leading # to set options mainly useful for reporting servers.
# The server defaults are faster for transactions and fast SELECTs.
# Adjust sizes as needed, experiment to find the optimal values.
# join_buffer_size = 128M
# sort_buffer_size = 2M
# read_rnd_buffer_size = 2M
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock

# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0

保存退出，先不要重启mysql。

2.2 master节点安装半同步插件

192.168.1.161节点上执行：

[root@master ~]# mysql -u root -p
Enter password: 此处输入mysql密码
#执行安装插件
mysql> INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';
#退出
mysql> quit
#重启mysql
[root@master ~]# systemctl restart mysqld
[root@master ~]# mysql -u root -p
Enter password: 此处输入mysql密码
#查看主节点状态
mysql> show master status \G;

记录file和position

2.3 slave节点配置文件修改

192.168.1.162和192.168.1.163两台从节点上执行

[root@slave1 ~]# vi /etc/my.cnf
# For advice on how to change settings please see
# http://dev.mysql.com/doc/refman/5.7/en/server-configuration-defaults.html

[mysqld]
#
# Remove leading # and set to the amount of RAM for the most important data
# cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%.
# innodb_buffer_pool_size = 128M
#
# Remove leading # to turn on a very important data integrity option: logging
# changes to the binary log between backups.
# log_bin
#myslq服务器id
server-id=2
#开启binlog
log-bin=mysql-bin
#
relay_log=mysql-relay-bin
#配置为只读模式,mha故障转移主节点到此台机器会自动关闭只读模式
read_only=1
#开启半同步复制
rpl_semi_sync_slave_enabled=1
#跳过名称反向解析
skip_name_resolve=1
#不清空中继日志
relay_log_purge=0
#
# Remove leading # to set options mainly useful for reporting servers.
# The server defaults are faster for transactions and fast SELECTs.
# Adjust sizes as needed, experiment to find the optimal values.
# join_buffer_size = 128M
# sort_buffer_size = 2M
# read_rnd_buffer_size = 2M

保存退出，暂时不要重启mysql

2.4 slave节点安装半同步复制插件

192.168.1.162和192.168.1.163节点上执行：

[root@master ~]# mysql -u root -p
Enter password: 此处输入mysql密码
#执行安装插件
mysql> INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';
#退出
mysql> quit
#重启mysql
[root@master ~]# systemctl restart mysqld
[root@master ~]# mysql -u root -p
Enter password: 此处输入mysql密码
#查看主节点状态
mysql> show master status \G;

2.5 开启主从同步

2.5.1 master节点操作

192.168.1.161节点上执行：

[root@master ~]# mysql -u root -p
Enter password: 此处输入mysql密码
#启同步账户和授权
mysql> grant REPLICATION SLAVE on *.* to 'root'@'%' identified by 'Mysql@123';
mysql> GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY 'Mysql@123' WITH GRANT
OPTION;
#查看主节点状态
mysql> show master status \G;

2.5.2 slave节点操作

192.168.1.162和192.168.1.163节点上执行：

[root@master ~]# mysql -u root -p
Enter password: 此处输入mysql密码
#master上查看到binlog文件为mysql-bin.000011 偏移量为 154
mysql>CHANGE MASTER TO MASTER_HOST='192.168.1.161',MASTER_PORT=3306,MASTER_USER='root',MASTER_PASSWORD='Mysql@123',MASTER_LOG_FILE='mysql-bin.000011',MASTER_LOG_POS=154;
#开启 slave
mysql> start slave;
Query OK, 0 rows affected (0.00 sec)
#查看 slave状态，Slave_IO_Running: Yes  Slave_SQL_Running: Yes 则状态正常
mysql> show slave status \G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.1.161
                  Master_User: root
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000011
          Read_Master_Log_Pos: 154
               Relay_Log_File: mysql-relay-bin.000002
                Relay_Log_Pos: 320
        Relay_Master_Log_File: mysql-bin.000011
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 154
              Relay_Log_Space: 527
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 1
                  Master_UUID: 5c51c3ed-09db-11eb-bfa2-000c2904c259
             Master_Info_File: /var/lib/mysql/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

ERROR: 
No query specified

mysql>

2.3 验证主从同步功能

在192.168.1.161上执行建库建表

mysql> create database test;
Query OK, 1 row affected (0.00 sec)

mysql> use test;
Database changed
mysql> CREATE TABLE `position` (
     `id` int(11) NOT NULL,
     `name` varchar(30) DEFAULT NULL,
     `salary` varchar(30) DEFAULT NULL,
     `city` varchar(30) DEFAULT NULL,
     PRIMARY KEY (`id`)
     ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.01 sec)

mysql> CREATE TABLE `position_detail` (
     `id` int(11) NOT NULL,
     `pid` int(11) DEFAULT NULL,
     `description` text,
     PRIMARY KEY (`id`)
     ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.01 sec)

mysql> show tables;
+-----------------+
| Tables_in_test  |
+-----------------+
| position        |
| position_detail |
+-----------------+
2 rows in set (0.00 sec)

mysql>

在192.168.1.162和192.168.1.163上查看库表是否创建

mysql> use test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> show tables;
+-----------------+
| Tables_in_test  |
+-----------------+
| position        |
| position_detail |
+-----------------+
2 rows in set (0.00 sec)

mysql>

一主双从搭建完成。

3.搭建MHA

3.1 配置三台Mysql节点ssh公钥互信

#在192.168.1.161,192.168.1.162,192.168.1.163三台服务器执行如下命令
ssh-keygen -t rsa
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.1.161
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.1.162
ssh-copy-id -i /root/.ssh/id_rsa.pub root@192.168.1.163

3.2 MHA安装

在192.168.1.161,192.168.1.162,192.168.1.163,192.168.1.164四台服务器执行如下命令，安装mha4mysql-node

#安装依赖
yum install -y perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager
#下载并安装mha4mysql-node
wget https://qiniu.wsfnk.com/mha4mysql-node-0.58-0.el7.centos.noarch.rpm
rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm

在192.168.1.164 MHA manager节点服务器执行如下命令，安装mha4mysql-manager

1
2
3

#下载并安装mha4mysql-manager
wget https://qiniu.wsfnk.com/mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
rpm -ivh mha4mysql-manager-0.58-0.el7.centos.noarch.rpm

3.3 MHA Manager 端配置

在192.168.1.164 manager节点执行如下操作，注意，实际配置时将下面配置文件的注释信息去掉，否则部分命令会报错

[root@mha ~]#  vi /etc/mha/cluster01.conf 
[server default]
#管理节点的用户
user=root 
#管理节点的密码
password=Mysql@123
#本地工作目录,会自动生成
manager_workdir=/data/mastermha/cluster01/
#mha日志
manager_log=/data/mastermha/cluster01/manager.log
#在远程主机的工作目录
remote_workdir=/data/mastermha/cluster01/
#MySQL二进制日志存储位置
master_binlog_dir=/var/lib/mysql
#ssh登录用户，就是我们上面配置免密码登录的账号
ssh_user=root
#MySQL复制用户的账号,这个用户应该在master和slave节点应该都存在
repl_user=root
#MySQL复制用户的密码
repl_password=Mysql@123
#指定ping间隔时间
ping_interval=1
[server1]
hostname=192.168.1.161
#如果配置该项表示它们有资格被选取为主服务器的,如果不配置表示无权限成为master
candidate_master=1
[server2]
hostname=192.168.1.162
candidate_master=1
[server3]
hostname=192.168.1.163
candidate_master=1

3.4 MHA验证及启动

3.4.1 检查mha的ssh免密登录状态

### 检查mha的ssh免密登录状态
[root@mha ~]# masterha_check_ssh --conf=/etc/mha/cluster01.conf
Sun Oct 11 10:07:16 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Oct 11 10:07:16 2020 - [info] Reading application default configuration from /etc/mha/cluster01.conf..
Sun Oct 11 10:07:16 2020 - [info] Reading server configuration from /etc/mha/cluster01.conf..
Sun Oct 11 10:07:16 2020 - [info] Starting SSH connection tests..
Sun Oct 11 10:07:18 2020 - [debug] 
Sun Oct 11 10:07:16 2020 - [debug]  Connecting via SSH from root@192.168.1.162(192.168.1.162:22) to root@192.168.1.161(192.168.1.161:22)..
Sun Oct 11 10:07:17 2020 - [debug]   ok.
Sun Oct 11 10:07:17 2020 - [debug]  Connecting via SSH from root@192.168.1.162(192.168.1.162:22) to root@192.168.1.163(192.168.1.163:22)..
Sun Oct 11 10:07:18 2020 - [debug]   ok.
Sun Oct 11 10:07:18 2020 - [debug] 
Sun Oct 11 10:07:16 2020 - [debug]  Connecting via SSH from root@192.168.1.161(192.168.1.161:22) to root@192.168.1.162(192.168.1.162:22)..
Sun Oct 11 10:07:16 2020 - [debug]   ok.
Sun Oct 11 10:07:16 2020 - [debug]  Connecting via SSH from root@192.168.1.161(192.168.1.161:22) to root@192.168.1.163(192.168.1.163:22)..
Sun Oct 11 10:07:17 2020 - [debug]   ok.
Sun Oct 11 10:07:19 2020 - [debug] 
Sun Oct 11 10:07:17 2020 - [debug]  Connecting via SSH from root@192.168.1.163(192.168.1.163:22) to root@192.168.1.161(192.168.1.161:22)..
Sun Oct 11 10:07:17 2020 - [debug]   ok.
Sun Oct 11 10:07:17 2020 - [debug]  Connecting via SSH from root@192.168.1.163(192.168.1.163:22) to root@192.168.1.162(192.168.1.162:22)..
Sun Oct 11 10:07:18 2020 - [debug]   ok.
Sun Oct 11 10:07:19 2020 - [info] All SSH connection tests passed successfully.

3.4.2 检查主备库的复制情况

[root@mha ~]# masterha_check_repl --conf=/etc/mha/cluster01.conf
Sun Oct 11 10:08:50 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Oct 11 10:08:50 2020 - [info] Reading application default configuration from /etc/mha/cluster01.conf..
Sun Oct 11 10:08:50 2020 - [info] Reading server configuration from /etc/mha/cluster01.conf..
Sun Oct 11 10:08:50 2020 - [info] MHA::MasterMonitor version 0.58.
Sun Oct 11 10:08:51 2020 - [info] Multi-master configuration is detected. Current primary(writable) master is 192.168.1.161(192.168.1.161:3306)
Sun Oct 11 10:08:51 2020 - [info] Master configurations are as below: 
Master 192.168.1.162(192.168.1.162:3306), replicating from 192.168.1.161(192.168.1.161:3306), read-only
Master 192.168.1.161(192.168.1.161:3306), replicating from 192.168.1.162(192.168.1.162:3306)

Sun Oct 11 10:08:51 2020 - [info] GTID failover mode = 0
Sun Oct 11 10:08:51 2020 - [info] Dead Servers:
Sun Oct 11 10:08:51 2020 - [info] Alive Servers:
Sun Oct 11 10:08:51 2020 - [info]   192.168.1.161(192.168.1.161:3306)
Sun Oct 11 10:08:51 2020 - [info]   192.168.1.162(192.168.1.162:3306)
Sun Oct 11 10:08:51 2020 - [info]   192.168.1.163(192.168.1.163:3306)
Sun Oct 11 10:08:51 2020 - [info] Alive Slaves:
Sun Oct 11 10:08:51 2020 - [info]   192.168.1.162(192.168.1.162:3306)  Version=5.7.28-log (oldest major version between slaves) log-bin:enabled
Sun Oct 11 10:08:51 2020 - [info]     Replicating from 192.168.1.161(192.168.1.161:3306)
Sun Oct 11 10:08:51 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Oct 11 10:08:51 2020 - [info]   192.168.1.163(192.168.1.163:3306)  Version=5.7.28-log (oldest major version between slaves) log-bin:enabled
Sun Oct 11 10:08:51 2020 - [info]     Replicating from 192.168.1.161(192.168.1.161:3306)
Sun Oct 11 10:08:51 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Oct 11 10:08:51 2020 - [info] Current Alive Master: 192.168.1.161(192.168.1.161:3306)
Sun Oct 11 10:08:51 2020 - [info] Checking slave configurations..
Sun Oct 11 10:08:51 2020 - [info] Checking replication filtering settings..
Sun Oct 11 10:08:51 2020 - [info]  binlog_do_db= , binlog_ignore_db= 
Sun Oct 11 10:08:51 2020 - [info]  Replication filtering check ok.
Sun Oct 11 10:08:51 2020 - [info] GTID (with auto-pos) is not supported
Sun Oct 11 10:08:51 2020 - [info] Starting SSH connection tests..
Sun Oct 11 10:08:55 2020 - [info] All SSH connection tests passed successfully.
Sun Oct 11 10:08:55 2020 - [info] Checking MHA Node version..
Sun Oct 11 10:08:55 2020 - [info]  Version check ok.
Sun Oct 11 10:08:55 2020 - [info] Checking SSH publickey authentication settings on the current master..
Sun Oct 11 10:08:56 2020 - [info] HealthCheck: SSH to 192.168.1.161 is reachable.
Sun Oct 11 10:08:56 2020 - [info] Master MHA Node version is 0.58.
Sun Oct 11 10:08:56 2020 - [info] Checking recovery script configurations on 192.168.1.161(192.168.1.161:3306)..
Sun Oct 11 10:08:56 2020 - [info]   Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql --output_file=/data/mastermha/cluster01//save_binary_logs_test --manager_version=0.58 --start_file=mysql-bin.000011 
Sun Oct 11 10:08:56 2020 - [info]   Connecting to root@192.168.1.161(192.168.1.161:22).. 
  Creating /data/mastermha/cluster01 if not exists..    ok.
  Checking output directory is accessible or not..
   ok.
  Binlog found at /var/lib/mysql, up to mysql-bin.000011
Sun Oct 11 10:08:57 2020 - [info] Binlog setting check done.
Sun Oct 11 10:08:57 2020 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Sun Oct 11 10:08:57 2020 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=192.168.1.162 --slave_ip=192.168.1.162 --slave_port=3306 --workdir=/data/mastermha/cluster01/ --target_version=5.7.28-log --manager_version=0.58 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Sun Oct 11 10:08:57 2020 - [info]   Connecting to root@192.168.1.162(192.168.1.162:22).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to mysql-relay-bin.000002
    Temporary relay log file is /var/lib/mysql/mysql-relay-bin.000002
    Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
    Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Sun Oct 11 10:08:57 2020 - [info]   Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=192.168.1.163 --slave_ip=192.168.1.163 --slave_port=3306 --workdir=/data/mastermha/cluster01/ --target_version=5.7.28-log --manager_version=0.58 --relay_log_info=/var/lib/mysql/relay-log.info  --relay_dir=/var/lib/mysql/  --slave_pass=xxx
Sun Oct 11 10:08:57 2020 - [info]   Connecting to root@192.168.1.163(192.168.1.163:22).. 
  Checking slave recovery environment settings..
    Opening /var/lib/mysql/relay-log.info ... ok.
    Relay log found at /var/lib/mysql, up to mysql-relay-bin.000002
    Temporary relay log file is /var/lib/mysql/mysql-relay-bin.000002
    Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
    Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
 done.
    Testing mysqlbinlog output.. done.
    Cleaning up test file(s).. done.
Sun Oct 11 10:08:58 2020 - [info] Slaves settings check done.
Sun Oct 11 10:08:58 2020 - [info] 
192.168.1.161(192.168.1.161:3306) (current master)
 +--192.168.1.162(192.168.1.162:3306)
 +--192.168.1.163(192.168.1.163:3306)

Sun Oct 11 10:08:58 2020 - [info] Checking replication health on 192.168.1.162..
Sun Oct 11 10:08:58 2020 - [info]  ok.
Sun Oct 11 10:08:58 2020 - [info] Checking replication health on 192.168.1.163..
Sun Oct 11 10:08:58 2020 - [info]  ok.
Sun Oct 11 10:08:58 2020 - [warning] master_ip_failover_script is not defined.
Sun Oct 11 10:08:58 2020 - [warning] shutdown_script is not defined.
Sun Oct 11 10:08:58 2020 - [info] Got exit code 0 (Not master dead).

MySQL Replication Health is OK.

3.4.3 启动MHA并查看状态

[root@mha ~]# nohup masterha_manager --conf=/etc/mha/cluster01.conf >  /data/mastermha/cluster01/sys.log 2>&1 &

#查看运行状态
[root@mha ~]# masterha_check_status --conf=/etc/mha/cluster01.conf
cluster01 (pid:115312) is running(0:PING_OK), master:192.168.1.161
[root@mha ~]#

3.4.4 模拟master故障验证MHA切换，并且将原master节点切换为新的slave

将192.168.1.161重启，此时查看mha上日志，观察主节点切换过程

[root@mha cluster01]# cd /data/mastermha/cluster01
[root@mha cluster01]# tail -f manager.log 
192.168.1.161(192.168.1.161:3306) (current master)
 +--192.168.1.162(192.168.1.162:3306)
 +--192.168.1.163(192.168.1.163:3306)

Sun Oct 11 10:10:33 2020 - [warning] master_ip_failover_script is not defined.
Sun Oct 11 10:10:33 2020 - [warning] shutdown_script is not defined.
Sun Oct 11 10:10:33 2020 - [info] Set master ping interval 1 seconds.
Sun Oct 11 10:10:33 2020 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Sun Oct 11 10:10:33 2020 - [info] Starting ping health check on 192.168.1.161(192.168.1.161:3306)..
Sun Oct 11 10:10:33 2020 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
Sun Oct 11 10:16:46 2020 - [warning] Got error on MySQL select ping: 2013 (Lost connection to MySQL server during query)
Sun Oct 11 10:16:46 2020 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql --output_file=/data/mastermha/cluster01//save_binary_logs_test --manager_version=0.58 --binlog_prefix=mysql-bin
Sun Oct 11 10:16:46 2020 - [warning] HealthCheck: SSH to 192.168.1.161 is NOT reachable.
Sun Oct 11 10:16:47 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.1.161' (111))
Sun Oct 11 10:16:47 2020 - [warning] Connection failed 2 time(s)..
Sun Oct 11 10:16:48 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.1.161' (111))
Sun Oct 11 10:16:48 2020 - [warning] Connection failed 3 time(s)..
Sun Oct 11 10:16:49 2020 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '192.168.1.161' (111))
Sun Oct 11 10:16:49 2020 - [warning] Connection failed 4 time(s)..
Sun Oct 11 10:16:49 2020 - [warning] Master is not reachable from health checker!
Sun Oct 11 10:16:49 2020 - [warning] Master 192.168.1.161(192.168.1.161:3306) is not reachable!
Sun Oct 11 10:16:49 2020 - [warning] SSH is NOT reachable.
Sun Oct 11 10:16:49 2020 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mha/cluster01.conf again, and trying to connect to all servers to check server status..
Sun Oct 11 10:16:49 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Oct 11 10:16:49 2020 - [info] Reading application default configuration from /etc/mha/cluster01.conf..
Sun Oct 11 10:16:49 2020 - [info] Reading server configuration from /etc/mha/cluster01.conf..
Sun Oct 11 10:16:50 2020 - [info] GTID failover mode = 0
Sun Oct 11 10:16:50 2020 - [info] Dead Servers:
Sun Oct 11 10:16:50 2020 - [info]   192.168.1.161(192.168.1.161:3306)
Sun Oct 11 10:16:50 2020 - [info] Alive Servers:
Sun Oct 11 10:16:50 2020 - [info]   192.168.1.162(192.168.1.162:3306)
Sun Oct 11 10:16:50 2020 - [info]   192.168.1.163(192.168.1.163:3306)
Sun Oct 11 10:16:50 2020 - [info] Alive Slaves:
Sun Oct 11 10:16:50 2020 - [info]   192.168.1.162(192.168.1.162:3306)  Version=5.7.28-log (oldest major version between slaves) log-bin:enabled
Sun Oct 11 10:16:50 2020 - [info]     Replicating from 192.168.1.161(192.168.1.161:3306)
Sun Oct 11 10:16:50 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Oct 11 10:16:50 2020 - [info]   192.168.1.163(192.168.1.163:3306)  Version=5.7.28-log (oldest major version between slaves) log-bin:enabled
Sun Oct 11 10:16:50 2020 - [info]     Replicating from 192.168.1.161(192.168.1.161:3306)
Sun Oct 11 10:16:50 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Oct 11 10:16:50 2020 - [info] Checking slave configurations..
Sun Oct 11 10:16:50 2020 - [info] Checking replication filtering settings..
Sun Oct 11 10:16:50 2020 - [info]  Replication filtering check ok.
Sun Oct 11 10:16:50 2020 - [info] Master is down!
Sun Oct 11 10:16:50 2020 - [info] Terminating monitoring script.
Sun Oct 11 10:16:50 2020 - [info] Got exit code 20 (Master dead).
Sun Oct 11 10:16:50 2020 - [info] MHA::MasterFailover version 0.58.
Sun Oct 11 10:16:50 2020 - [info] Starting master failover.
Sun Oct 11 10:16:50 2020 - [info] 
Sun Oct 11 10:16:50 2020 - [info] * Phase 1: Configuration Check Phase..
Sun Oct 11 10:16:50 2020 - [info] 
Sun Oct 11 10:16:51 2020 - [info] GTID failover mode = 0
Sun Oct 11 10:16:51 2020 - [info] Dead Servers:
Sun Oct 11 10:16:51 2020 - [info]   192.168.1.161(192.168.1.161:3306)
Sun Oct 11 10:16:51 2020 - [info] Checking master reachability via MySQL(double check)...
Sun Oct 11 10:16:51 2020 - [info]  ok.
Sun Oct 11 10:16:51 2020 - [info] Alive Servers:
Sun Oct 11 10:16:51 2020 - [info]   192.168.1.162(192.168.1.162:3306)
Sun Oct 11 10:16:51 2020 - [info]   192.168.1.163(192.168.1.163:3306)
Sun Oct 11 10:16:51 2020 - [info] Alive Slaves:
Sun Oct 11 10:16:51 2020 - [info]   192.168.1.162(192.168.1.162:3306)  Version=5.7.28-log (oldest major version between slaves) log-bin:enabled
Sun Oct 11 10:16:51 2020 - [info]     Replicating from 192.168.1.161(192.168.1.161:3306)
Sun Oct 11 10:16:51 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Oct 11 10:16:51 2020 - [info]   192.168.1.163(192.168.1.163:3306)  Version=5.7.28-log (oldest major version between slaves) log-bin:enabled
Sun Oct 11 10:16:51 2020 - [info]     Replicating from 192.168.1.161(192.168.1.161:3306)
Sun Oct 11 10:16:51 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Oct 11 10:16:51 2020 - [info] Starting Non-GTID based failover.
Sun Oct 11 10:16:51 2020 - [info] 
Sun Oct 11 10:16:51 2020 - [info] ** Phase 1: Configuration Check Phase completed.
Sun Oct 11 10:16:51 2020 - [info] 
Sun Oct 11 10:16:51 2020 - [info] * Phase 2: Dead Master Shutdown Phase..
Sun Oct 11 10:16:51 2020 - [info] 
Sun Oct 11 10:16:51 2020 - [info] Forcing shutdown so that applications never connect to the current master..
Sun Oct 11 10:16:51 2020 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master IP address.
Sun Oct 11 10:16:51 2020 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Sun Oct 11 10:16:52 2020 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Sun Oct 11 10:16:52 2020 - [info] 
Sun Oct 11 10:16:52 2020 - [info] * Phase 3: Master Recovery Phase..
Sun Oct 11 10:16:52 2020 - [info] 
Sun Oct 11 10:16:52 2020 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Sun Oct 11 10:16:52 2020 - [info] 
Sun Oct 11 10:16:52 2020 - [info] The latest binary log file/position on all slaves is mysql-bin.000011:1403
Sun Oct 11 10:16:52 2020 - [info] Latest slaves (Slaves that received relay log files to the latest):
Sun Oct 11 10:16:52 2020 - [info]   192.168.1.162(192.168.1.162:3306)  Version=5.7.28-log (oldest major version between slaves) log-bin:enabled
Sun Oct 11 10:16:52 2020 - [info]     Replicating from 192.168.1.161(192.168.1.161:3306)
Sun Oct 11 10:16:52 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Oct 11 10:16:52 2020 - [info]   192.168.1.163(192.168.1.163:3306)  Version=5.7.28-log (oldest major version between slaves) log-bin:enabled
Sun Oct 11 10:16:52 2020 - [info]     Replicating from 192.168.1.161(192.168.1.161:3306)
Sun Oct 11 10:16:52 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Oct 11 10:16:52 2020 - [info] The oldest binary log file/position on all slaves is mysql-bin.000011:1403
Sun Oct 11 10:16:52 2020 - [info] Oldest slaves:
Sun Oct 11 10:16:52 2020 - [info]   192.168.1.162(192.168.1.162:3306)  Version=5.7.28-log (oldest major version between slaves) log-bin:enabled
Sun Oct 11 10:16:52 2020 - [info]     Replicating from 192.168.1.161(192.168.1.161:3306)
Sun Oct 11 10:16:52 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Oct 11 10:16:52 2020 - [info]   192.168.1.163(192.168.1.163:3306)  Version=5.7.28-log (oldest major version between slaves) log-bin:enabled
Sun Oct 11 10:16:52 2020 - [info]     Replicating from 192.168.1.161(192.168.1.161:3306)
Sun Oct 11 10:16:52 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Oct 11 10:16:52 2020 - [info] 
Sun Oct 11 10:16:52 2020 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Sun Oct 11 10:16:52 2020 - [info] 
Sun Oct 11 10:16:52 2020 - [warning] Dead Master is not SSH reachable. Could not save it's binlogs. Transactions that were not sent to the latest slave (Read_Master_Log_Pos to the tail of the dead master's binlog) were lost.
Sun Oct 11 10:16:52 2020 - [info] 
Sun Oct 11 10:16:52 2020 - [info] * Phase 3.3: Determining New Master Phase..
Sun Oct 11 10:16:52 2020 - [info] 
Sun Oct 11 10:16:52 2020 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Sun Oct 11 10:16:52 2020 - [info] All slaves received relay logs to the same position. No need to resync each other.
Sun Oct 11 10:16:52 2020 - [info] Searching new master from slaves..
Sun Oct 11 10:16:52 2020 - [info]  Candidate masters from the configuration file:
Sun Oct 11 10:16:52 2020 - [info]   192.168.1.162(192.168.1.162:3306)  Version=5.7.28-log (oldest major version between slaves) log-bin:enabled
Sun Oct 11 10:16:52 2020 - [info]     Replicating from 192.168.1.161(192.168.1.161:3306)
Sun Oct 11 10:16:52 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Oct 11 10:16:52 2020 - [info]   192.168.1.163(192.168.1.163:3306)  Version=5.7.28-log (oldest major version between slaves) log-bin:enabled
Sun Oct 11 10:16:52 2020 - [info]     Replicating from 192.168.1.161(192.168.1.161:3306)
Sun Oct 11 10:16:52 2020 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Oct 11 10:16:52 2020 - [info]  Non-candidate masters:
Sun Oct 11 10:16:52 2020 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Sun Oct 11 10:16:52 2020 - [info] New master is 192.168.1.162(192.168.1.162:3306)
Sun Oct 11 10:16:52 2020 - [info] Starting master failover..
Sun Oct 11 10:16:52 2020 - [info] 
From:
192.168.1.161(192.168.1.161:3306) (current master)
 +--192.168.1.162(192.168.1.162:3306)
 +--192.168.1.163(192.168.1.163:3306)

To:
192.168.1.162(192.168.1.162:3306) (new master)
 +--192.168.1.163(192.168.1.163:3306)
Sun Oct 11 10:16:52 2020 - [info] 
Sun Oct 11 10:16:52 2020 - [info] * Phase 3.4: New Master Diff Log Generation Phase..
Sun Oct 11 10:16:52 2020 - [info] 
Sun Oct 11 10:16:52 2020 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Sun Oct 11 10:16:52 2020 - [info] 
Sun Oct 11 10:16:52 2020 - [info] * Phase 3.5: Master Log Apply Phase..
Sun Oct 11 10:16:52 2020 - [info] 
Sun Oct 11 10:16:52 2020 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Sun Oct 11 10:16:52 2020 - [info] Starting recovery on 192.168.1.162(192.168.1.162:3306)..
Sun Oct 11 10:16:52 2020 - [info]  This server has all relay logs. Waiting all logs to be applied.. 
Sun Oct 11 10:16:52 2020 - [info]   done.
Sun Oct 11 10:16:52 2020 - [info]  All relay logs were successfully applied.
Sun Oct 11 10:16:52 2020 - [info] Getting new master's binlog name and position..
Sun Oct 11 10:16:52 2020 - [info]  mysql-bin.000011:297
Sun Oct 11 10:16:52 2020 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.1.162', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000011', MASTER_LOG_POS=297, MASTER_USER='root', MASTER_PASSWORD='xxx';
Sun Oct 11 10:16:52 2020 - [warning] master_ip_failover_script is not set. Skipping taking over new master IP address.
Sun Oct 11 10:16:52 2020 - [info] Setting read_only=0 on 192.168.1.162(192.168.1.162:3306)..
Sun Oct 11 10:16:52 2020 - [info]  ok.
Sun Oct 11 10:16:52 2020 - [info] ** Finished master recovery successfully.
Sun Oct 11 10:16:52 2020 - [info] * Phase 3: Master Recovery Phase completed.
Sun Oct 11 10:16:52 2020 - [info] 
Sun Oct 11 10:16:52 2020 - [info] * Phase 4: Slaves Recovery Phase..
Sun Oct 11 10:16:52 2020 - [info] 
Sun Oct 11 10:16:52 2020 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Sun Oct 11 10:16:52 2020 - [info] 
Sun Oct 11 10:16:52 2020 - [info] -- Slave diff file generation on host 192.168.1.163(192.168.1.163:3306) started, pid: 123426. Check tmp log /data/mastermha/cluster01//192.168.1.163_3306_20201011101650.log if it takes time..
Sun Oct 11 10:16:53 2020 - [info] 
Sun Oct 11 10:16:53 2020 - [info] Log messages from 192.168.1.163 ...
Sun Oct 11 10:16:53 2020 - [info] 
Sun Oct 11 10:16:52 2020 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Sun Oct 11 10:16:53 2020 - [info] End of log messages from 192.168.1.163.
Sun Oct 11 10:16:53 2020 - [info] -- 192.168.1.163(192.168.1.163:3306) has the latest relay log events.
Sun Oct 11 10:16:53 2020 - [info] Generating relay diff files from the latest slave succeeded.
Sun Oct 11 10:16:53 2020 - [info] 
Sun Oct 11 10:16:53 2020 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Sun Oct 11 10:16:53 2020 - [info] 
Sun Oct 11 10:16:53 2020 - [info] -- Slave recovery on host 192.168.1.163(192.168.1.163:3306) started, pid: 123458. Check tmp log /data/mastermha/cluster01//192.168.1.163_3306_20201011101650.log if it takes time..
Sun Oct 11 10:16:54 2020 - [info] 
Sun Oct 11 10:16:54 2020 - [info] Log messages from 192.168.1.163 ...
Sun Oct 11 10:16:54 2020 - [info] 
Sun Oct 11 10:16:53 2020 - [info] Starting recovery on 192.168.1.163(192.168.1.163:3306)..
Sun Oct 11 10:16:53 2020 - [info]  This server has all relay logs. Waiting all logs to be applied.. 
Sun Oct 11 10:16:53 2020 - [info]   done.
Sun Oct 11 10:16:53 2020 - [info]  All relay logs were successfully applied.
Sun Oct 11 10:16:53 2020 - [info]  Resetting slave 192.168.1.163(192.168.1.163:3306) and starting replication from the new master 192.168.1.162(192.168.1.162:3306)..
Sun Oct 11 10:16:53 2020 - [info]  Executed CHANGE MASTER.
Sun Oct 11 10:16:54 2020 - [info]  Slave started.
Sun Oct 11 10:16:54 2020 - [info] End of log messages from 192.168.1.163.
Sun Oct 11 10:16:54 2020 - [info] -- Slave recovery on host 192.168.1.163(192.168.1.163:3306) succeeded.
Sun Oct 11 10:16:54 2020 - [info] All new slave servers recovered successfully.
Sun Oct 11 10:16:54 2020 - [info] 
Sun Oct 11 10:16:54 2020 - [info] * Phase 5: New master cleanup phase..
Sun Oct 11 10:16:54 2020 - [info] 
Sun Oct 11 10:16:54 2020 - [info] Resetting slave info on the new master..
Sun Oct 11 10:16:54 2020 - [info]  192.168.1.162: Resetting slave info succeeded.
Sun Oct 11 10:16:54 2020 - [info] Master failover to 192.168.1.162(192.168.1.162:3306) completed successfully.
Sun Oct 11 10:16:54 2020 - [info] 

----- Failover Report -----

cluster01: MySQL Master failover 192.168.1.161(192.168.1.161:3306) to 192.168.1.162(192.168.1.162:3306) succeeded

Master 192.168.1.161(192.168.1.161:3306) is down!

Check MHA Manager logs at mha:/data/mastermha/cluster01/manager.log for details.

Started automated(non-interactive) failover.
The latest slave 192.168.1.162(192.168.1.162:3306) has all relay logs for recovery.
Selected 192.168.1.162(192.168.1.162:3306) as a new master.
192.168.1.162(192.168.1.162:3306): OK: Applying all logs succeeded.
192.168.1.163(192.168.1.163:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
192.168.1.163(192.168.1.163:3306): OK: Applying all logs succeeded. Slave started, replicating from 192.168.1.162(192.168.1.162:3306)
192.168.1.162(192.168.1.162:3306): Resetting slave info succeeded.
Master failover to 192.168.1.162(192.168.1.162:3306) completed successfully.

可以看到162节点切成功切换为新的master。接下来说明下161如何恢复为新的slave：

在192.168.1.164MHA manager上查看恢复语句。

[root@mha cluster01]#  cd /data/mastermha/cluster01
[root@mha cluster01]# grep "CHANGE MASTER TO MASTER" manager.log | tail -1
Sun Oct 11 10:16:52 2020 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.1.162', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000011', MASTER_LOG_POS=297, MASTER_USER='root', MASTER_PASSWORD='xxx';
[root@mha cluster01]#

记录下 CHANGE MASTER TO MASTER_HOST=’192.168.1.162’, MASTER_PORT=3306, MASTER_LOG_FILE=’mysql-bin.000011’, MASTER_LOG_POS=297, MASTER_USER=’root’, MASTER_PASSWORD=’xxx’;

MASTER_PASSWORD值修改为实际Mysql root密码，在192.168.1.161上执行如下操作：

##因为161之前为master，需要先将Mysql设置为只读模式
[root@master ~]# mysql -u root -p
Enter password: 
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 4
Server version: 5.7.28-log MySQL Community Server (GPL)

Copyright (c) 2000, 2019, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> set global read_only=1;
Query OK, 0 rows affected (0.00 sec)

mysql> CHANGE MASTER TO MASTER_HOST='192.168.1.162', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000011', MASTER_LOG_POS=297, MASTER_USER='root', MASTER_PASSWORD='Mysql@123';
ERROR 3021 (HY000): This operation cannot be performed with a running slave io thread; run STOP SLAVE IO_THREAD FOR CHANNEL '' first.
mysql> stop slave;
Query OK, 0 rows affected (0.00 sec)

mysql> CHANGE MASTER TO MASTER_HOST='192.168.1.162', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000011', MASTER_LOG_POS=297, MASTER_USER='root', MASTER_PASSWORD='Mysql@123';
Query OK, 0 rows affected, 2 warnings (0.00 sec)

mysql> start slave;
Query OK, 0 rows affected (0.00 sec)

mysql> show slave status \G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 192.168.1.162
                  Master_User: root
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000011
          Read_Master_Log_Pos: 297
               Relay_Log_File: master-relay-bin.000002
                Relay_Log_Pos: 320
        Relay_Master_Log_File: mysql-bin.000011
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 297
              Relay_Log_Space: 528
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 2
                  Master_UUID: 9eaa3d1a-d77d-4e93-9f92-8d0e3c2fc3bb
             Master_Info_File: /var/lib/mysql/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)

ERROR: 
No query specified

mysql>

此时161恢复完成，成为新的slave节点。然后可以在162执行sql进行主从同步验证啦！

至此，Mysql一主双从+MHA高可用集群搭建完成。

注意，MHA主备切换完成后，MHA会自动关闭。需要手工修改配置，并且将/data/mastermha/cluster01/cluster01.failover.complete文件删除后，再次启动。否则下次自动切换会出错！！！！

附MHA常用命令

1 检查mha的ssh免密登录状态
masterha_check_ssh --conf=/etc/mha/cluster01.conf
2 检查mha的运行状态
masterha_check_status --conf=/etc/mha/cluster01.conf
3 检查主备库的复制情况
masterha_check_repl --conf=/etc/mha/cluster01.conf
4 停止mha
masterha_stop --conf=/etc/mha/cluster01.conf
5 启动mha
nohup masterha_manager --conf=/etc/mha/cluster01.conf >  /data/mastermha/cluster01/sys.log 2>&1 &
6 mha手动切换主库
masterha_master_switch --conf=/etc/mha/cluster01.conf --master_state=alive --new_master_host=192.168.1.161 --new_master_port=3306 --orig_master_is_new_slave

1.环境说明

1.1 操作系统和软件版本

1.2 主机角色介绍

2.一主双从半同步复制

2.1 master节点配置文件修改

2.2 master节点安装半同步插件

2.3 slave节点配置文件修改

2.4 slave节点安装半同步复制插件

2.5 开启主从同步

2.5.1 master节点操作

2.5.2 slave节点操作

2.3 验证主从同步功能

3.搭建MHA

3.1 配置三台Mysql节点ssh公钥互信

3.2 MHA安装

3.3 MHA Manager 端配置

3.4 MHA验证及启动

3.4.1 检查mha的ssh免密登录状态

3.4.2 检查主备库的复制情况

3.4.3 启动MHA并查看状态

3.4.4 模拟master故障验证MHA切换，并且将原master节点切换为新的slave

预览: