pt-online-schema-change breaks AWS DMS Replication

777 views Asked by At

Currently using AWS DMS to replicate data from our Aurora MySQL database to S3. This results in a low-latency data lake we can use to get lineage of all changes occurring and build additional data pipelines off of. However, when making a change via pt-online-schema-change script the modified table stops replicating at all. Is there any reason why this would happen?

After running the change the logs show that the schemas for the source table no longer match what DMS is expecting, and the CDC changes are skipped. The only possible reason for this is DMS is not properly tracking the DML statements.

  1. Table alter triggered with percona (in this case, add column)
  2. New table synced by AWS DMS
  3. Trigger adds throw warnings in AWS DMS as not supported
  4. Table is renamed
  5. Table column count does not match, ignoring extra columns.
  6. Table column size mismatch, skipping.

Notably, all the DML statements being used by Percona (outside triggers) are supported by AWS DMS and S3 as a target. Does anyone else have any experience with this situation or combination of tools?

Edit:

Here's an example of the command used to make these changes with Percona:

pt-online-schema-change --host=<host> \
                        --user=<user> \
                        --ask-pass \
                        --execute \
                        --no-drop-old-table \
                        --no-check-alter \
                        --alter="ADD COLUMN db_row_update_stamp  TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3) ON UPDATE CURRENT_TIMESTAMP(3)" \
                        D=<db>,t=<REPLACE_TABLE_NAME_HERE>
2

There are 2 answers

2
jardis On

So looking at the DETAILED_DEBUG logs for this task, I was testing a RENAME scenario in AWS DMS manually.

This resulted in the following.

2021-02-27T00:38:43:255381 [SOURCE_CAPTURE  ]T:  Event timestamp '2021-02-27 00:38:38' (1614386318), pos 54593835  (mysql_endpoint_capture.c:3293)
2021-02-27T00:38:43:255388 [SOURCE_CAPTURE  ]D:  > QUERY_EVENT  (mysql_endpoint_capture.c:3306)
2021-02-27T00:38:43:255394 [SOURCE_CAPTURE  ]T:  Default DB = 'my_db'  (mysql_endpoint_capture.c:1713)
2021-02-27T00:38:43:255399 [SOURCE_CAPTURE  ]T:  SQL statement = 'RENAME TABLE test_table TO _test_table_old'  (mysql_endpoint_capture.c:1720)
2021-02-27T00:38:43:255409 [SOURCE_CAPTURE  ]T:  DDL DB = '', table = '', verb = 0  (mysql_endpoint_capture.c:1734)
2021-02-27T00:38:43:255414 [SOURCE_CAPTURE  ]T:  >>> Unsupported or commented out DDL: 'RENAME TABLE test_table TO _test_table_old'  (mysql_endpoint_capture.c:1742)

It seems that this version of DMS does not properly read RENAME statements despite the documentation claiming support for RENAME's.

Am looking into opening a bug on AWS's side. This impacted AWS DMS server version 3.4.3.

Will be testing against previous versions, will post an update if I find a specific version has this fixed until it is resolved in a newer version. Can't 100% claim it's a bug in DMS, but taking Percona out of the picture I was able to replicate the problem.

0
rock_walker On

The single option here how to fix broken replication is to "Reload table data".

In AWS Console DMS select your migration task, go to "Table statistics" tab and select your table, which have been altered (RENAME-ed) under the hood by Percona tool.

In a nutshell "Reload table data" action refreshes replication instance, cleans up caches and creates new snapshot for your data in S3. Creating new snapshot in S3 will cause your replication being out-of-sync for some time. The period of time for recovering replication will linearly depends of volume of your data in table and performance of chosen replication instance.

Unfortunately RENAME TABLE statement isn't supported by DMS and assuming, that no one else could support it properly, as it breaks data checksums (or checkpoints in AWS).