I want to update the statistic count in mysql.
The SQL is as follow:
REPLACE INTO `record_amount`(`source`,`owner`,`day_time`,`count`) VALUES (?,?,?,?)
Schema :
CREATE TABLE `record_amount` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'id',
`owner` varchar(50) NOT NULL ,
`source` varchar(50) NOT NULL ,
`day_time` varchar(10) NOT NULL,
`count` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `src_time` (`owner`,`source`,`day_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
However, it caused a DEADLOCK exception in multi-processes running (i.e. Map-Reduce).
I've read some materials online and confused about those locks. I know innodb uses row-level lock. I can just use the table-lock to solve the business problem but it is a little extreme. I found some possible solutions:
- change
REPLACE INTO
to transaction withSELECT id FOR UPDATE
andUPDATE
- change
REPLACE INTO
toINSERT ... ON DUPLICATE KEY UPDATE
I have no idea that which is practical and better. Can someone explain it or offer some links for me to read and study? Thank you!
Are you building a summary table, one source row at a time? And effectively doing
UPDATE ... count = count+1
? Throw away the code and start over. MAP-REDUCE on that is like using a sledge hammer on a thumbtack.A single statement approximately like that will do all the work at virtually disk I/O speed. No
SELECT ... FOR UPDATE
. No deadlocks. No multiple threads. Etc.Further improvements:
AUTO_INCREMENT
; turn theUNIQUE
intoPRIMARY KEY
.DATETIME
truncated to an hour? (Or something like that.) UseDATETIME
, you will have much more flexibility in querying.To discuss further, please elaborate on the source data (`CREATE TABLE, number of rows, frequency of processing, etc) and other details. If this is really a Data Warehouse application with a Summary table, I may have more suggestions.
If the data is coming from a file, do
LOAD DATA
to shovel it into a temp tableraw
so that the aboveINSERT..SELECT
can work. If it is of manageable size, makeraw
Engine=MEMORY
to avoid any I/O for it.If you have multiple feeds, my high-speed-ingestion blog discusses how to have multiple threads without any deadlocks.