说明:
系统:centos7
主库 M:192.168.16.12:3306
从库 S:192.168.16.15:3306
主从复制:传统复制
一、场景
M、S目前基于log、pos点的复制,想要将其在线切换为基于GTID的复制,
操作步骤(M、S都需执行,哪个先不影响):
1、M、S:mysql> set @@global.enforce_gtid_consistency=warn;
2、M、S:mysql> set @@global.enforce_gtid_consistency=on;
3、M、S:mysql> SET @@GLOBAL.GTID_MODE = OFF_PERMISSIVE;
4、M、S:mysql> SET @@GLOBAL.GTID_MODE = on_permissive;
5、M、S:Mysql> SHOW STATUS LIKE 'ONGOING_ANONYMOUS_TRANSACTION_COUNT'; //是否=0
6、M、S:mysql> SET @@GLOBAL.GTID_MODE = ON;
7、M、S:修改配置文件
配置完成后查看slave状态:
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.16.12
Master_User: slave
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000079
Read_Master_Log_Pos: 584132
Relay_Log_File: relay-log.000001
Relay_Log_Pos: 4
Relay_Master_Log_File: mysql-bin.000079
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 584132
Relay_Log_Space: 4
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_UUID: 58343797-e7f4-11e8-9c78-246e968f00b0
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: System lock
Master_Retry_Count: 86400
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set: 58343797-e7f4-11e8-9c78-246e968f00b0:1-442
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)
二、现象
1、连续查看好多次slave的状态,这几个点都一直不变(此时master的这几个点一直在变大,gtid增加到了800多)
Master_Log_File: mysql-bin.000079
Read_Master_Log_Pos: 584132
Executed_Gtid_Set: 58343797-e7f4-11e8-9c78-246e968f00b0:1-442
2、连续查看好多次slave的状态,发现
Slave_IO_Running: 由Yes变为 Connecting
3、slave的error日志信息:
2018-11-21T10:00:46.981201+08:00 2803 [Note] Error reading relay log event for channel '': slave SQL thread was killed
2018-11-21T10:00:46.981503+08:00 2802 [Note] Slave I/O thread killed while waiting to reconnect after a failed read for channel ''
2018-11-21T10:00:46.981522+08:00 2802 [Note] Slave I/O thread exiting for channel '', read up to log 'mysql-bin.000079', position 584132
2018-11-21T10:00:58.950843+08:00 2808 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2018-11-21T10:00:58.951239+08:00 2809 [Note] Slave SQL thread for channel '' initialized, starting replication in log 'mysql-bin.000079' at position 584132, relay log '/data/mysql/relay-log.000001' position: 4
2018-11-21T10:00:58.951370+08:00 2808 [Note] Slave I/O thread for channel '': connected to master 'slave@192.168.16.12:3306',replication started in log 'mysql-bin.000079' at position 584132
4、查看master的进程:有多个slave复制账户的连接
master errror log:
[Note] While initializing dump thread for slave with UUID <30525ae8-e7f4-11e8-9af5-246e968f0048>, found a zombie dump thread with the same UUID. Master is killing the zombie dump thread(9549).
2018-11-21T09:59:20.506354+08:00 2168119 [Note] While initializing dump thread for slave with UUID <30525ae8-e7f4-11e8-9af5-246e968f0048>, found a zombie dump thread with the same UUID. Master is killing the zombie dump thread(2167205).
2018-11-21T10:03:42.429940+08:00 2168119 [Note] Start binlog_dump to master_thread_id(2168119) slave_server(2), pos(, 4)
三、原因
此时master可能执行了一个大的insert or update,超过了30s master没有推送binlog给slave。导致slave重连。
slave 过了 slave-net-timeout(30)秒还没有收到主库来的数据,它就会开始第一次重试。重试的过程中,连上了主库,那么它认为当前主库是好的
然后再过 30s 则再一次重连主库。
所以此时才看到master上有多个slave的进程。
四、解决方法
将slave-net-timeout调大,设置为1小时
mysql> set global slave-net-time=3600;
五、错误的做法
看到二 出现的现象后,做了如下操作:
stop slave;
change master to master_user='192.168.16.12',master_port=3306,master_user='slave',master_password='',master_auto_position=1;
start slave;
此时查看slave的状态如下:
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.16.12
Master_User: slave
Master_Port: 3306
Connect_Retry: 60
Master_Log_File:
Read_Master_Log_Pos:
Relay_Log_File: relay-log.000001
Relay_Log_Pos: 4
Relay_Master_Log_File: mysql-bin.000079
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Master_Server_Id: 1
Master_UUID: 58343797-e7f4-11e8-9c78-246e968f00b0
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: System lock
Master_Retry_Count: 86400
Master_SSL_Crlpath:
Retrieved_Gtid_Set:
Executed_Gtid_Set: 58343797-e7f4-11e8-9c78-246e968f00b0:1-442
Auto_Position: 1
1 row in set (0.00 sec)
slave无法找到master的file、pos位置点,会从master最开始的binlog进行读取,slave的日志如下:
[Note] Slave SQL thread for channel '' initialized, starting replication in log 'FIRST' at position 0, relay log '/data/mysql/relay-log.000001' position: 4
2018-11-21T10:05:47.912953+08:00 2820 [Note] Slave I/O thread for channel '': connected to master 'slave@192.168.16.12:3306',replication started in log 'FIRST' at position 4
2018-11-21T10:57:52.722093+08:00 31 [ERROR] Error reading packet from server for channel '': Cannot replicate anonymous transaction when AUTO_POSITION = 1, at file
/data/mysql/mysql-bin.000003, position 194.; the first event '' at 4, the last event read from '/data/mysql/mysql-bin.000003' at 259, the last b
yte read from '/data/mysql/mysql-bin.000003' at 259. (server_errno=1236)
2018-11-21T10:57:52.722133+08:00 31 [ERROR] Slave I/O for channel '': Got fatal error 1236 from master when reading data from binary log: 'Cannot replicate anonymo
us transaction when AUTO_POSITION = 1, at file /data/mysql/mysql-bin.000003, position 194.; the first event '' at 4, the last event read from '/data/mysql
/mysql-bin.000003' at 259, the last byte read from '/data/mysql/mysql-bin.000003' at 259.', Error_code: 1236
此时需要先关闭auto_position,再设置file、pos点,最后再开启auto_position=1,具体如下:
set global slave-net-time=3600;
stop slave;
change master to master_auto_position=0;
change master to master_user='192.168.16.12',master_port=3306,master_user='slave',master_password='',master_log_file=mysql-bin.000079,master_log_pos=584132;
change master to master_auto_position=1;
start slave;
再次查看slave的状态,可以看到状态正常:
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.16.12
Master_User: slave
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000079
Read_Master_Log_Pos: 175299608
Relay_Log_File: relay-log.000002
Relay_Log_Pos: 34778333
Relay_Master_Log_File: mysql-bin.000079
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 35362051
Relay_Log_Space: 174716091
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 3317
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
Master_UUID: 58343797-e7f4-11e8-9c78-246e968f00b0
Master_Info_File: mysql.slave_master_info
SQL_Delay: 0
SQL_Remaining_Delay: NULL
Slave_SQL_Running_State: System lock
Master_Retry_Count: 86400
Master_Bind:
Last_IO_Error_Timestamp:
Last_SQL_Error_Timestamp:
Master_SSL_Crl:
Master_SSL_Crlpath:
Retrieved_Gtid_Set: 58343797-e7f4-11e8-9c78-246e968f00b0:443-20481
Executed_Gtid_Set: 58343797-e7f4-11e8-9c78-246e968f00b0:1-4431
Auto_Position: 1
Replicate_Rewrite_DB:
Channel_Name:
Master_TLS_Version:
1 row in set (0.00 sec)