Broken MySQL GTID replication (malaligned GTIDs)

783 views Asked by At

Using Percona MySQL 5.6 with sql_slave_parallel_workers=5 on Debian 8. Sometimes GTID replication breaks and I don't know why. I thought that the GTIDs are executed in a consecutive order, but when looking at status

*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: d22.local
                  Master_User: xyz
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.039232
          Read_Master_Log_Pos: 219044
               Relay_Log_File: mysqld-relay-bin.072392
                Relay_Log_Pos: 90640
        Relay_Master_Log_File: mysql-bin.036196
             Slave_IO_Running: Yes
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB: xyz_etl
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 1032
                   Last_Error: Could not execute Update_rows event on table xyz.sessions; Can't find record in 'sessions', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.036196, end_log_pos 78709552
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 78708927
              Relay_Log_Space: 1337994488
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 1032
               Last_SQL_Error: Could not execute Update_rows event on table xyz.sessions; Can't find record in 'sessions', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.036196, end_log_pos 78709552
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 22
                  Master_UUID: 0e7b97a8-a689-11e5-8b79-901b0e8b0f53
             Master_Info_File: /var/lib/mysql/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State:
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp: 161219 20:32:20
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 0e7b97a8-a689-11e5-8b79-901b0e8b0f53:60397-45157441
            Executed_Gtid_Set: 0e7b97a8-a689-11e5-8b79-901b0e8b0f53:1-42679868:42679870-42679876:42679878-42679879:42679881-42679890:42679892-42679908:42679910:42679913:42679916-42679917:42679919-42679927:42679929-42679932:42679934:42679936:42679938-42679939:42679944:42679946-42679950:42679952-42679955:42679957-42679964:42679966:42679969-42679970:42679972:42679974-42679977:42679979-42679980:42679984-42679986:42679988-42679990:42679994-42679996:42679998:42680000-42680001:42680003-42680006:42680009-42680011:42680013-42680018:42680021:42680024:42680026:42680030:42680032:42680035:42680038,
aea3618e-bacf-11e6-9506-b8ca3a67f830:1-10937274
                Auto_Position: 1
1 row in set (0.00 sec)

I'm a bit confused. sql_slave_parallel_workers is set to 0 now. But the error claimed above is GTID 42679909 instead of 42679868 as expected. What's the reason for this. And what are the correct steps to solve a broken replication like above? What I don't understand is, that the transaction with GTID 42679869 can be executed without problems, theoretically. But doing a STOP SLAVE; START SLAVE; does not process them?!

1

There are 1 answers

0
rabudde On BEST ANSWER

To answer it and help others, here the steps I've done:

  • setting slave_parallel_workers=0
  • one have to pay attention to field Executed_Gtid_Set only and handle all gaps in GTID list one after another with STOP SLAVE; SET GTID_NEXT="[...]"; BEGIN; COMMIT; SET GTID_NEXT="AUTOMATIC"; START SLAVE;
  • when point is reached, that replication will continue automatically without error set slave_parallel_workers to previous value