Mysql Innodb deadlock problems on REPLACE INTO

4.7k views Asked by At

I want to update the statistic count in mysql.

The SQL is as follow:

REPLACE INTO `record_amount`(`source`,`owner`,`day_time`,`count`) VALUES (?,?,?,?)

Schema :

CREATE TABLE `record_amount` (
   `id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'id',
   `owner` varchar(50) NOT NULL ,
   `source` varchar(50) NOT NULL ,
   `day_time` varchar(10) NOT NULL,
   `count` int(11) NOT NULL,
   PRIMARY KEY (`id`),
   UNIQUE KEY `src_time` (`owner`,`source`,`day_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

However, it caused a DEADLOCK exception in multi-processes running (i.e. Map-Reduce).

I've read some materials online and confused about those locks. I know innodb uses row-level lock. I can just use the table-lock to solve the business problem but it is a little extreme. I found some possible solutions:

  1. change REPLACE INTO to transaction with SELECT id FOR UPDATE and UPDATE
  2. change REPLACE INTO to INSERT ... ON DUPLICATE KEY UPDATE

I have no idea that which is practical and better. Can someone explain it or offer some links for me to read and study? Thank you!

1

There are 1 answers

3
Rick James On

Are you building a summary table, one source row at a time? And effectively doing UPDATE ... count = count+1? Throw away the code and start over. MAP-REDUCE on that is like using a sledge hammer on a thumbtack.

INSERT INTO summary (source, owner, day_time, count)
    SELECT source, owner, day_time, COUNT(*)
        FROM raw
        GROUP BY source, owner, day_time
    ON DUPLICATE KEY UPDATE count = count + VALUES(count);

A single statement approximately like that will do all the work at virtually disk I/O speed. No SELECT ... FOR UPDATE. No deadlocks. No multiple threads. Etc.

Further improvements:

  • Get rid of the AUTO_INCREMENT; turn the UNIQUE into PRIMARY KEY.
  • day_time -- is that a DATETIME truncated to an hour? (Or something like that.) Use DATETIME, you will have much more flexibility in querying.

To discuss further, please elaborate on the source data (`CREATE TABLE, number of rows, frequency of processing, etc) and other details. If this is really a Data Warehouse application with a Summary table, I may have more suggestions.

If the data is coming from a file, do LOAD DATA to shovel it into a temp table raw so that the above INSERT..SELECT can work. If it is of manageable size, make raw Engine=MEMORY to avoid any I/O for it.

If you have multiple feeds, my high-speed-ingestion blog discusses how to have multiple threads without any deadlocks.