Percona TokuDB recovery running over 24 hours

250 views Asked by At

Background
Yesterday, we had a situation that presented (at least to my eyes) like a deadlock. A Truncate table (running against an un-contended) table, had been issued, it's status was "query completed" but never cleared from the process list. Further, user connections to unrelated tables within that schema were unable to connect with "waiting for metadata lock (or similar, didn't get the screenshot)".

A first attempt to kill the truncate query, then then the hung queries resulted in their status changing to "killed" but never had them disappear from the process list.

At this point, even a superuser was unable to login (a previously opened monitor showed the connection as -- waiting for metadata lock).

An orderly shutdown was attempted but after some lengthy period, it looks like the OS kill -9'd mysql. As a bunch of memory unexpectedly seemed pinned, a reboot seemed in order and was done.

Now, for the past 24ish hours, Percona seems to be either replaying (--very--slowly) a number of transactions I can't imagine or is in a replay loop of some kind. To give you a sense of the log messages:

2015-08-07 09:01:42 25894 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2015-08-07 09:01:42 25894 [Note] InnoDB: Memory barrier is not used
2015-08-07 09:01:42 25894 [Note] InnoDB: Compressed tables use zlib 1.2.3
2015-08-07 09:01:42 25894 [Note] InnoDB: Using Linux native AIO
2015-08-07 09:01:42 25894 [Note] InnoDB: Using CPU crc32 instructions
2015-08-07 09:01:42 25894 [Note] InnoDB: Initializing buffer pool, size = 20.0G
2015-08-07 09:01:43 25894 [Note] InnoDB: Completed initialization of buffer pool
2015-08-07 09:01:44 25894 [Note] InnoDB: Highest supported file format is Barracuda.
2015-08-07 09:01:44 25894 [Note] InnoDB: The log sequence numbers 551884808204 and 551884808204 in ibdata files do not match the log sequence number 551884808684 in the ib_logfiles!
2015-08-07 09:01:44 25894 [Note] InnoDB: Database was not shutdown normally!
2015-08-07 09:01:44 25894 [Note] InnoDB: Starting crash recovery.
2015-08-07 09:01:44 25894 [Note] InnoDB: Reading tablespace information from the .ibd files...
2015-08-07 09:01:44 25894 [Note] InnoDB: Restoring possible half-written data pages
2015-08-07 09:01:44 25894 [Note] InnoDB: from the doublewrite buffer...
2015-08-07 09:01:44 25894 [Note] InnoDB: starting tracking changed pages from LSN 551884808684
InnoDB: Transaction 122307881 was in the XA prepared state.
InnoDB: 1 transaction(s) which must be rolled back or cleaned up
InnoDB: in total 0 row operations to undo
InnoDB: Trx id counter is 122333952
InnoDB: Last MySQL binlog file position 0 134745064, file name mysqld-bin.000281
2015-08-07 09:01:44 25894 [Note] InnoDB: 128 rollback segment(s) are active.
InnoDB: Starting in background the rollback of uncommitted transactions
2015-08-07 09:01:44 7f7b96bff700  InnoDB: Rollback of non-prepared transactions completed
2015-08-07 09:01:44 25894 [Note] InnoDB: Waiting for purge to start
2015-08-07 09:01:44 25894 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.25-73.1 started; log sequence number 551884808684
Fri Aug  7 09:01:44 2015 TokuFT recovery starting in env /var/lib/mysql/
Fri Aug  7 09:01:45 2015 TokuFT recovery scanning backward from 957963270
Fri Aug  7 09:01:45 2015 TokuFT recovery bw_end_checkpoint at 957917890 timestamp 1438892691965548 xid 957841798 (bw_newer)
Fri Aug  7 09:01:45 2015 TokuFT recovery bw_begin_checkpoint at 957841798 timestamp 1438892662604852 (bw_between)
Fri Aug  7 09:01:45 2015 TokuFT recovery turning around at begin checkpoint 957841798 time 29360696
Fri Aug  7 09:01:45 2015 TokuFT recovery starts scanning forward to 957963270 from 957841798 left 121472 (fw_between)
Fri Aug  7 09:02:09 2015 TokuFT lsn 957930340 commit xid 405185:0 1909760/293672232 1%
Fri Aug  7 09:02:24 2015 TokuFT lsn 957930340 commit xid 405185:0 3742720/293672232 1%
Fri Aug  7 09:02:39 2015 TokuFT lsn 957930340 commit xid 405185:0 5556224/293672232 2%
Fri Aug  7 09:02:54 2015 TokuFT lsn 957930340 commit xid 405185:0 7327744/293672232 2%
Fri Aug  7 09:03:09 2015 TokuFT lsn 957930340 commit xid 405185:0 8884224/293672232 3%
-- snip --
Fri Aug  7 09:21:09 2015 TokuFT lsn 957930340 commit xid 405185:0 150720512/293672232 51%
Fri Aug  7 09:21:24 2015 TokuFT lsn 957930340 commit xid 405185:0 152825856/293672232 52%
Fri Aug  7 09:21:39 2015 TokuFT lsn 957930340 commit xid 405185:0 154894336/293672232 53%
150807 09:21:42 mysqld_safe Transparent huge pages are already set to: never.
150807 09:21:42 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
2015-08-07 09:21:42 0 [Note] /usr/sbin/mysqld (mysqld 5.6.25-73.1-log) starting as process 29206 ...
2015-08-07 09:21:42 29206 [Note] InnoDB: Using atomics to ref count buffer pool pages
2015-08-07 09:21:42 29206 [Note] InnoDB: The InnoDB memory heap is disabled
2015-08-07 09:21:42 29206 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2015-08-07 09:21:42 29206 [Note] InnoDB: Memory barrier is not used
2015-08-07 09:21:42 29206 [Note] InnoDB: Compressed tables use zlib 1.2.3
2015-08-07 09:21:42 29206 [Note] InnoDB: Using Linux native AIO
2015-08-07 09:21:42 29206 [Note] InnoDB: Using CPU crc32 instructions
2015-08-07 09:21:42 29206 [Note] InnoDB: Initializing buffer pool, size = 20.0G
2015-08-07 09:21:43 29206 [Note] InnoDB: Completed initialization of buffer pool
2015-08-07 09:21:44 29206 [Note] InnoDB: Highest supported file format is Barracuda.
2015-08-07 09:21:44 29206 [Note] InnoDB: The log sequence numbers 551884808204 and 551884808204 in ibdata files do not match the log sequence number 551884808694 in the ib_logfiles!
2015-08-07 09:21:44 29206 [Note] InnoDB: Database was not shutdown normally!
2015-08-07 09:21:44 29206 [Note] InnoDB: Starting crash recovery.
2015-08-07 09:21:44 29206 [Note] InnoDB: Reading tablespace information from the .ibd files...
2015-08-07 09:21:44 29206 [Note] InnoDB: Restoring possible half-written data pages
2015-08-07 09:21:44 29206 [Note] InnoDB: from the doublewrite buffer...
2015-08-07 09:21:44 29206 [Note] InnoDB: starting tracking changed pages from LSN 551884808694
InnoDB: Transaction 122307881 was in the XA prepared state.
InnoDB: 1 transaction(s) which must be rolled back or cleaned up
InnoDB: in total 0 row operations to undo
InnoDB: Trx id counter is 122334464
InnoDB: Last MySQL binlog file position 0 134745064, file name mysqld-bin.000281
2015-08-07 09:21:45 29206 [Note] InnoDB: 128 rollback segment(s) are active.
InnoDB: Starting in background the rollback of uncommitted transactions
2015-08-07 09:21:45 7feb457ff700  InnoDB: Rollback of non-prepared transactions completed
2015-08-07 09:21:45 29206 [Note] InnoDB: Waiting for purge to start
2015-08-07 09:21:45 29206 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.25-73.1 started; log sequence number 551884808694
Fri Aug  7 09:21:45 2015 TokuFT recovery starting in env /var/lib/mysql/
Fri Aug  7 09:21:45 2015 TokuFT recovery scanning backward from 957963270
Fri Aug  7 09:21:45 2015 TokuFT recovery bw_end_checkpoint at 957917890 timestamp 1438892691965548 xid 957841798 (bw_newer)
Fri Aug  7 09:21:46 2015 TokuFT recovery bw_begin_checkpoint at 957841798 timestamp 1438892662604852 (bw_between)
Fri Aug  7 09:21:46 2015 TokuFT recovery turning around at begin checkpoint 957841798 time 29360696
Fri Aug  7 09:21:46 2015 TokuFT recovery starts scanning forward to 957963270 from 957841798 left 121472 (fw_between)
Fri Aug  7 09:22:10 2015 TokuFT lsn 957930340 commit xid 405185:0 2105344/293672232 1%
Fri Aug  7 09:22:25 2015 TokuFT lsn 957930340 commit xid 405185:0 3942400/293672232 1%
Fri Aug  7 09:22:40 2015 TokuFT lsn 957930340 commit xid 405185:0 5748736/293672232 2%
Fri Aug  7 09:22:55 2015 TokuFT lsn 957930340 commit xid 405185:0 7554048/293672232 3%
Fri Aug  7 09:23:10 2015 TokuFT lsn 957930340 commit xid 405185:0 9150464/293672232 3%
-- snip --
Fri Aug  7 09:25:40 2015 TokuFT lsn 957930340 commit xid 405185:0 28329984/293672232 10%
Fri Aug  7 09:25:55 2015 TokuFT lsn 957930340 commit xid 405185:0 30284800/293672232 10%
Fri Aug  7 09:26:10 2015 TokuFT lsn 957930340 commit xid 405185:0 32245760/293672232 11%
Fri Aug  7 09:26:25 2015 TokuFT lsn 957930340 commit xid 405185:0 34166784/293672232 12%
Fri Aug  7 09:26:40 2015 TokuFT lsn 957930340 commit xid 405185:0 36102144/293672232 12%
Fri Aug  7 09:26:55 2015 TokuFT lsn 957930340 commit xid 405185:0 38104064/293672232 13%
Fri Aug  7 09:27:10 2015 TokuFT lsn 957930340 commit xid 405185:0 40033280/293672232 14%
Fri Aug  7 09:27:25 2015 TokuFT lsn 957930340 commit xid 405185:0 41992192/293672232 14%
-- snip --
Fri Aug  7 09:31:10 2015 TokuFT lsn 957930340 commit xid 405185:0 70963200/293672232 24%
Fri Aug  7 09:31:25 2015 TokuFT lsn 957930340 commit xid 405185:0 72942592/293672232 25%
Fri Aug  7 09:31:40 2015 TokuFT lsn 957930340 commit xid 405185:0 74896384/293672232 26%
Fri Aug  7 09:31:55 2015 TokuFT lsn 957930340 commit xid 405185:0 76867584/293672232 26%
Fri Aug  7 09:32:10 2015 TokuFT lsn 957930340 commit xid 405185:0 78902272/293672232 27%
Fri Aug  7 09:32:25 2015 TokuFT lsn 957930340 commit xid 405185:0 80914432/293672232 28%
Fri Aug  7 09:32:40 2015 TokuFT lsn 957930340 commit xid 405185:0 82846720/293672232 28%
Fri Aug  7 09:32:55 2015 TokuFT lsn 957930340 commit xid 405185:0 84836352/293672232 29%
Fri Aug  7 09:33:10 2015 TokuFT lsn 957930340 commit xid 405185:0 86836224/293672232 30%
--- snip --
Fri Aug  7 09:36:25 2015 TokuFT lsn 957930340 commit xid 405185:0 113692672/293672232 39%
Fri Aug  7 09:36:40 2015 TokuFT lsn 957930340 commit xid 405185:0 115834880/293672232 39%
Fri Aug  7 09:36:55 2015 TokuFT lsn 957930340 commit xid 405185:0 118014976/293672232 40%
Fri Aug  7 09:37:10 2015 TokuFT lsn 957930340 commit xid 405185:0 120132608/293672232 41%
Fri Aug  7 09:37:25 2015 TokuFT lsn 957930340 commit xid 405185:0 122251264/293672232 42%
Fri Aug  7 09:37:40 2015 TokuFT lsn 957930340 commit xid 405185:0 124372992/293672232 42%
-- snip --
Fri Aug  7 09:38:40 2015 TokuFT lsn 957930340 commit xid 405185:0 133026816/293672232 45%
Fri Aug  7 09:38:55 2015 TokuFT lsn 957930340 commit xid 405185:0 135154688/293672232 46%
Fri Aug  7 09:39:10 2015 TokuFT lsn 957930340 commit xid 405185:0 137283584/293672232 47%
Fri Aug  7 09:39:25 2015 TokuFT lsn 957930340 commit xid 405185:0 139439104/293672232 47%
-- snip --
Fri Aug  7 09:41:10 2015 TokuFT lsn 957930340 commit xid 405185:0 154284032/293672232 53%
Fri Aug  7 09:41:25 2015 TokuFT lsn 957930340 commit xid 405185:0 156420096/293672232 53%
Fri Aug  7 09:41:40 2015 TokuFT lsn 957930340 commit xid 405185:0 158516224/293672232 54%
150807 09:41:42 mysqld_safe Transparent huge pages are already set to: never.
150807 09:41:42 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql

So while the log sequence and the TokuFT commit xid numbers seems to stay static, the scanned checkpoints and % complete (always ~49-55%) change from run to run.

It's unclear if this is somehow unrolling some gigantic transaction (the user was doing some massive table updates, DDL changes, etc and I'm convinced had autocommit=0) or this is a loop that's never going to end.

Any hints on how to proceed or troubleshoot further?

Thanks.

1

There are 1 answers

0
MDB On

A case of looking in all the wrong places. SystemD was "helpfully" timing out the startup of the server and kicking off a full restart of MySQL.

A good thing to keep in mind if you have a long xlog to replay - temporarily disable systemd's timeout (or up it from the default of 10 minutes).