MySQL高可用MMM
环境:
角色 | ip | Server-id | Write vip | Read vip |
---|---|---|---|---|
master1 | 192.168.1.7 | 1 | 192.168.1.100 | |
master2 | 192.168.1.8 | 2 | 192.168.1.101 | |
slave1 | 192.168.1.10 | 3 | 192.168.1.102 | |
slave2 | 192.168.1.12 | 4 | 192.168.1.103 | |
monitor | 192.168.1.13 | 无 |
一、环境部署
①部署perl环境(所有主机)
[root@192 ~]# yum -y install perl-* libart_lgpl.x86_64 rrdtool.x86_64 rrdtool-perl.x86_64
MMM架构基于perl环境实现,务必安装perl所有环境插件。如果yum报错,请认真排查,推荐yum remove -y libvirt-client,之后重新yum安装
②安装相关插件库(所有主机)
[root@192 ~]# cpan -i Algorithm::Diff Class::Singleton DBI DBD::mysql Log::Dispatch Log::Log4perl Mail::Send Net::Ping Proc::Daemon Time::HiRes Params::Validate Net::ARP
提示后,回车继续
③关闭防火墙或者开启端口,关闭selinx,修改相应主机名
mmm_agent:代理端口号为9989
mmm_monitor:监控端口为9988
[root@192 ~]# hostnamectl set-hostname master1
[root@192 ~]# hostnamectl set-hostname master2
[root@192 ~]# hostnamectl set-hostname slave1
[root@192 ~]# hostnamectl set-hostname slave2
二、配置主从
主从环境:
master2,slave1,slave2都是master1的slave库
master1是master2的slave库
①master1:
[root@master1 ~]# cat /etc/my.cnf
[mysqld]
basedir = /usr/local/mysql
datadir = /usr/local/mysql/data
port = 3306
server_id = 1
socket = /usr/local/mysql/mysql.sock
log-error=/usr/local/mysql/data/mysqld.err
log-bin = mysql-bin
relay-log = relay-bin
relay-log-index = slave-relay-bin.index
log-slave-updates = 1
auto-increment-increment = 2
auto-increment-offset = 1
binlog_format = mixed
[client]
host = 127.0.0.1
user = root
password = 123.com
②master2:
[root@master2 ~]# cat /etc/my.cnf
[mysqld]
basedir = /usr/local/mysql
datadir = /usr/local/mysql/data
port = 3306
server_id = 2
socket = /usr/local/mysql/mysql.sock
log-error=/usr/local/mysql/data/mysqld.err
log-bin = mysql-bin
relay-log = relay-bin
relay-log-index = slave-relay-bin.index
log-slave-updates = 1
auto-increment-increment = 2
auto-increment-offset = 2
binlog_format = mixed
[client]
host = 127.0.0.1
user = root
password = 123.com
③slave1:
[root@slave1 ~]# cat /etc/my.cnf
[mysqld]
basedir = /usr/local/mysql
datadir = /usr/local/mysql/data
port = 3306
server_id = 3
socket = /usr/local/mysql/mysql.sock
log-error=/usr/local/mysql/data/mysqld.err
relay-log = relay-bin
relay-log-index = slave-relay-bin.index
[client]
host = 127.0.0.1
user = root
password = 123.com
④slave2:
[root@slave2 ~]# cat /etc/my.cnf
[mysqld]
basedir = /usr/local/mysql
datadir = /usr/local/mysql/data
port = 3306
server_id = 4
socket = /usr/local/mysql/mysql.sock
log-error=/usr/local/mysql/data/mysqld.err
relay-log = relay-bin
relay-log-index = slave-relay-bin.index
[client]
host = 127.0.0.1
user = root
password = 123.com
⑤权限与change
master1:
mysql> grant replication slave on . to myslave@’%’ identified by ‘123.com’;
mysql> show master status \G
*************************** 1. row ***************************
File: mysql-bin.000001
Position: 154
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set:
master2:
mysql> grant replication slave on . to myslave@’%’ identified by ‘123.com’;
mysql> show master status \G
*************************** 1. row ***************************
File: mysql-bin.000001
Position: 154
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set:
mysql> change master to master_host=’192.168.1.7′,master_user=’myslave’,master_password=’123.com’,master_log_file=’mysql-bin.000001’,master_log_pos=154;
mysql> start slave;
slave1:
mysql> change master to master_host=’192.168.1.7′,master_user=’myslave’,master_password=’123.com’,master_log_file=’mysql-bin.000001’,master_log_pos=154;
mysql> start slave;
slave2:
mysql> change master to master_host=’192.168.1.7′,master_user=’myslave’,master_password=’123.com’,master_log_file=’mysql-bin.000001’,master_log_pos=154;
mysql> start slave;
master1:
mysql> change master to master_host=’192.168.1.8′,master_user=’myslave’,master_password=’123.com’,master_log_file=’mysql-bin.000001’,master_log_pos=154;
mysql> start slave;
三、配置MMM
代理守护进程(四台DB主机)步骤:
①配置监控用户权限(主从复制只在master-1执行即可)
mysql> grant super,replication client,process on . to ‘mmm_agent’@’%’ identified by ‘123.com’;
mysql> grant replication client on . to ‘mmm_monitor’@’%’ identified by ‘123.com’;
mysql> select user,host from mysql.user where user in (‘mmm_monitor’,’mmm_agent’);
user | host |
---|---|
mmm_agent | % |
mmm_monitor | % |
②安装MMM(monitor主机与四台主从主机都需要安装)
[root@master1 ~]# wget http://mysql-mmm.org/_media/:mmm2:mysql-mmm-2.2.1.tar.gz
[root@master1 ~]# tar zxf :mmm2:mysql-mmm-2.2.1.tar.gz
[root@master1 ~]# cd mysql-mmm-2.2.1/
[root@master1 mysql-mmm-2.2.1]# make && make install
③配置mmm_common.conf(monitor主机与四台主从主机务必一致)
[root@master1 mysql-mmm]# vim mmm_common.conf
active_master_role writer
<host default>
cluster_interface eno16777736
pid_path /var/run/mmm_agentd.pid
bin_path /usr/lib/mysql-mmm/
replication_user myslave
replication_password 123.com
agent_user mmm_agent
agent_password 123.com
</host>
<host master1>
ip 192.168.1.7
mode master
peer master2
</host>
<host master2>
ip 192.168.1.8
mode master
peer master1
</host>
<host slave1>
ip 192.168.1.10
mode slave
</host>
<host slave2>
ip 192.168.1.12
mode slave
</host>
<role writer>
hosts master1,master2
ips 192.168.1.100
mode exclusive
</role>
<role reader>
hosts master2,slave1,slave2
ips 192.168.1.101, 192.168.1.102, 192.168.1.103
mode balanced
</role>
④配置mmm_agent.conf(monitor主机不需要配置,四台主从主机都需更改其相应主机名)
[root@master1 ~]# cat /etc/mysql-mmm/mmm_agent.conf
include mmm_common.conf
this master1
[root@master2 ~]# cat /etc/mysql-mmm/mmm_agent.conf
include mmm_common.conf
this master2
[root@slave1 ~]# cat /etc/mysql-mmm/mmm_agent.conf
include mmm_common.conf
this slave1
[root@slave2 ~]# cat /etc/mysql-mmm/mmm_agent.conf
include mmm_common.conf
this slave2
⑤修改mysql-mmm-agent执行脚本(五台主机全部修改)
[root@master1 ~]# cat /etc/init.d/mysql-mmm-agent
#!/bin/sh
source /root/.bash_profile
# mysql-mmm-agent This shell script takes care of starting and stopping
# the mmm agent daemon.
#
# chkconfig: - 64 36
.......
⑥启动mmm-agent代理进程(四台DB主机启动代理进程)
[root@master1 ~]# chkconfig –add mysql-mmm-agent
[root@master1 ~]# chkconfig mysql-mmm-agent on
[root@master1 ~]# /etc/init.d/mysql-mmm-agent start
Daemon bin: '/usr/sbin/mmm_agentd'
Daemon pid: '/var/run/mmm_agentd.pid'
Starting MMM Agent daemon... running.
如果在启动时报错,有原因为perl环境的问题,根据提示缺少什么就安装什么
[root@master1 ~]# ss -anpt | grep agentd
LISTEN 0 10 192.168.1.4:9989 *:* users:(("mmm_agentd",pid=26468,fd=3))
监控进程(monitor主机)步骤:
①安装MMM(参考上述步骤)
②配置mmm_common.conf(与DB主机一致)
③配置mmm_mon.conf
[root@localhost ~]# cat /etc/mysql-mmm/mmm_mon.conf
include mmm_common.conf
<monitor>
ip 127.0.0.1
pid_path /var/run/mmm_mond.pid
bin_path /usr/lib/mysql-mmm/
status_path /var/lib/misc/mmm_mond.status
ping_ips 192.168.1.7,192.168.1.8,192.168.1.10,192.168.1.12
auto_set_online 0
</monitor>
<host default>
monitor_user mmm_monitor
monitor_password 123.com
</host>
debug 0
④启动mmm-monitor监控进程
[root@localhost ~]# chkconfig –add mysql-mmm-monitor
[root@localhost ~]# chkconfig mysql-mmm-monitor on
[root@loaclhost ~]# /etc/init.d/mysql-mmm-monitor start
Daemon bin: '/usr/sbin/mmm_mond'
Daemon pid: '/var/run/mmm_mond.pid'
Starting MMM Monitor daemon: running
[root@localhost ~]# ss -anpt | grep mond
LISTEN 0 10 127.0.0.1:9988 *:* users:(("mmm_mond",pid=27378,fd=9))
四、测试MMM高可用
步骤:
①在初次启动monitor之初,要检查各节状态
[root@localhost ~]# mmm_control show
master1(192.168.1.7) master/AWAITING_RECOVERY. Roles:
master2(192.168.1.8) master/AWAITING_RECOVERY. Roles:
slave1(192.168.1.10) slave/AWAITING_RECOVERY. Roles:
slave2(192.168.1.12) slave/AWAITING_RECOVERY. Roles:
②启动各节点
[root@localhost ~]# mmm_control set_online slave1
OK: State of ‘slave1’ changed to ONLINE. Now you can wait some time and check its new roles!
[root@localhost ~]# mmm_control set_online master1
OK: State of ‘master1’ changed to ONLINE. Now you can wait some time and check its new roles!
[root@localhost ~]# mmm_control set_online master2
OK: State of ‘master2’ changed to ONLINE. Now you can wait some time and check its new roles!
[root@localhost ~]# mmm_control set_online slave2
OK: State of ‘slave2’ changed to ONLINE. Now you can wait some time and check its new roles!
[root@localhost init.d]# mmm_control show
master1(192.168.1.4) master/ONLINE. Roles: writer(192.168.1.100)
master2(192.168.1.8) master/ONLINE. Roles: reader(192.168.1.103)
slave1(192.168.1.10) slave/ONLINE. Roles: reader(192.168.1.101)
slave2(192.168.1.12) slave/ONLINE. Roles: reader(192.168.1.102)
③查看各节点vip状态
master1:
[root@master1 ~]# ip a
2: eno16777736: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:81:20:3c brd ff:ff:ff:ff:ff:ff
inet 192.168.1.4/24 brd 192.168.1.255 scope global dynamic eno16777736
valid_lft 78128sec preferred_lft 78128sec
inet 192.168.1.100/32 scope global eno16777736
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fe81:203c/64 scope link
valid_lft forever preferred_lft forever
master2:
[root@master2 ~]# ip a
2: eno16777736: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:f6:7f:57 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.8/24 brd 192.168.1.255 scope global dynamic eno16777736
valid_lft 78143sec preferred_lft 78143sec
inet 192.168.1.103/32 scope global eno16777736
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fef6:7f57/64 scope link
valid_lft forever preferred_lft forever
slave1:
[root@slave1 ~]# ip a
2: eno16777736: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:4b:6a:1e brd ff:ff:ff:ff:ff:ff
inet 192.168.1.10/24 brd 192.168.1.255 scope global dynamic eno16777736
valid_lft 78172sec preferred_lft 78172sec
inet 192.168.1.101/32 scope global eno16777736
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fe4b:6a1e/64 scope link
valid_lft forever preferred_lft forever
slave2:
[root@slave2 ~]# ip a
2: eno16777736: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:db:f7:b8 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.12/24 brd 192.168.1.255 scope global dynamic eno16777736
valid_lft 80764sec preferred_lft 80764sec
inet 192.168.1.102/32 scope global eno16777736
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fedb:f7b8/64 scope link
valid_lft forever preferred_lft forever
④模拟master1宕机,观察vip状态,以及主从复制状态
master1:
[root@master1 ~]# systemctl stop mysqld
monitor:
[root@localhost ~]# tailf /var/log/mysql-mmm/mmm_mond.log
2018/04/22 15:14:05 WARN Check 'rep_backlog' on 'master1' is in unknown state! Message: UNKNOWN: Connect error (host = 192.168.1.7:3306, user = mmm_monitor)! Can't connect to MySQL server on '192.168.1.7' (111)
2018/04/22 15:14:16 FATAL State of host 'master1' changed from ONLINE to HARD_OFFLINE (ping: OK, mysql: not OK)
2018/04/22 15:14:16 INFO Removing all roles from host 'master1':
2018/04/22 15:14:16 INFO Removed role 'writer(192.168.1.100)' from host 'master1'
2018/04/22 15:14:16 INFO Orphaned role 'writer(192.168.1.100)' has been assigned to 'master2'
[root@localhost ~]# mmm_control show
master1(192.168.1.7) master/HARD_OFFLINE. Roles:
master2(192.168.1.8) master/ONLINE. Roles: reader(192.168.1.101), writer(192.168.1.100)
slave1(192.168.1.10) slave/ONLINE. Roles: reader(192.168.1.102)
slave2(192.168.1.12) slave/ONLINE. Roles: reader(192.168.1.103)
[root@localhost ~]# mmm_control checks all
master1 ping [last change: 2018/04/22 15:42:02] OK
master1 mysql [last change: 2018/04/22 15:47:57] ERROR: Connect error (host = 192.168.1.7:3306, user = mmm_monitor)! Can't connect to MySQL server on '192.168.1.7' (111)
master2:
[root@master2 ~]# ip a
2: eno16777736: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 00:0c:29:f6:7f:57 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.8/24 brd 192.168.1.255 scope global dynamic eno16777736
valid_lft 77901sec preferred_lft 77901sec
inet 192.168.1.103/32 scope global eno16777736
valid_lft forever preferred_lft forever
inet 192.168.1.100/32 scope global eno16777736
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fef6:7f57/64 scope link
valid_lft forever preferred_lft forever
slave1:
mysql> show slave statusG
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.1.8
Master_User: myslave
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 154
Relay_Log_File: relay-bin.000002
Relay_Log_Pos: 320
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
......
slave2:
mysql> show slave status G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.1.8
Master_User: myslave
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 154
Relay_Log_File: relay-bin.000002
Relay_Log_Pos: 320
Relay_Master_Log_File: mysql-bin.000001
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
.......
⑤重新手动启动master1,观察各状态
master1:
[root@master1 ~]# systemctl restart mysqld
monitor:
[root@localhost ~]# tailf /var/log/mysql-mmm/mmm_mond.log
2018/04/22 15:53:27 INFO Check 'mysql' on 'master1' is ok!
2018/04/22 15:53:28 FATAL State of host 'master1' changed from HARD_OFFLINE to AWAITING_RECOVERY
2018/04/22 15:53:28 INFO Check 'rep_threads' on 'master1' is ok!
2018/04/22 15:53:28 INFO Check 'rep_backlog' on 'master1' is ok
[root@localhost ~]# mmm_control show
master1(192.168.1.7) master/AWAITING_RECOVERY. Roles:
master2(192.168.1.8) master/ONLINE. Roles: reader(192.168.1.101), writer(192.168.1.100)
slave1(192.168.1.10) slave/ONLINE. Roles: reader(192.168.1.102)
slave2(192.168.1.12) slave/ONLINE. Roles: reader(192.168.1.103)
观察到master1从hard_offline变为awaiting状态,重新使master1上线
[root@localhost ~]# mmm_control set_online master1
OK: State of ‘master1’ changed to ONLINE. Now you can wait some time and check its new roles!
再去观察最新的集群状态,发现master启动后不会重新接管主
[root@localhost ~]# mmm_control show
master1(192.168.1.7) master/ONLINE. Roles:
master2(192.168.1.8) master/ONLINE. Roles: reader(192.168.1.101), writer(192.168.1.100)
slave1(192.168.1.10) slave/ONLINE. Roles: reader(192.168.1.102)
slave2(192.168.1.12) slave/ONLINE. Roles: reader(192.168.1.103)
五、总结
①maser2备主节点宕机不会影响集群性能,就是移除了master2备选节点的读状态
②master1主节点宕机,master2备主会接管角色,slave1,slave2会重新指向新的主库进行复制,自动change
③如果master1主库宕机,master2复制应用又落后于master1时就变成了主可写状态,这时数据无法保持一致
④如果master2,slave1,slave2延迟于master1主,master1宕机,slave1,slave2将会等待数据同步master1后,再重新指向master2,这时数据无法保持一致
⑤如果采用MMM高可用架构,主,备节点机器配置一样,而开启半同步进一步提高安全性或采用mariadb/mysql5.7进行多线程从复制,提高复制性能
⑥monitor根据mmm_mon.conf中auto_set_online是否开启,每隔60s检查主机状态,将等待awaiting_recovery设置为online,前提已经从故障状态hard_offline中恢复,monitor监控数据库的三种状态分别为HARD_OFFLINE→AWATING_RECOVERY→ONLINE
⑦对外提供的vip是由monitor程序提供。monitor不启动,vip是不会提供的。如果已经分配好了vip,monitor关闭了原先分配的vip,不会立即关闭外部程序只要不重启网络,这样好处对于monitor可靠性要求低一点,但是如果是服务器宕掉了,vip发生变更,访问的宕机服务器是不会接受访问的