I want to update the statistic count in mysql.
The SQL is as follow:
REPLACE INTO `record_amount`(`source`,`owner`,`day_time`,`count`) VALUES (?,?,?,?)
Schema :
CREATE TABLE `record_amount` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'id',
`owner` varchar(50) NOT NULL ,
`source` varchar(50) NOT NULL ,
`day_time` varchar(10) NOT NULL,
`count` int(11) NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `src_time` (`owner`,`source`,`day_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
However, it caused a DEADLOCK exception in multi-processes running (i.e. Map-Reduce).
I've read some materials online and confused about those locks. I know innodb uses row-level lock. I can just use the table-lock to solve the business problem but it is a little extreme. I found some possible solutions:
- change
REPLACE INTOto transaction withSELECT id FOR UPDATEandUPDATE - change
REPLACE INTOtoINSERT ... ON DUPLICATE KEY UPDATE
I have no idea that which is practical and better. Can someone explain it or offer some links for me to read and study? Thank you!
Are you building a summary table, one source row at a time? And effectively doing
UPDATE ... count = count+1? Throw away the code and start over. MAP-REDUCE on that is like using a sledge hammer on a thumbtack.A single statement approximately like that will do all the work at virtually disk I/O speed. No
SELECT ... FOR UPDATE. No deadlocks. No multiple threads. Etc.Further improvements:
AUTO_INCREMENT; turn theUNIQUEintoPRIMARY KEY.DATETIMEtruncated to an hour? (Or something like that.) UseDATETIME, you will have much more flexibility in querying.To discuss further, please elaborate on the source data (`CREATE TABLE, number of rows, frequency of processing, etc) and other details. If this is really a Data Warehouse application with a Summary table, I may have more suggestions.
If the data is coming from a file, do
LOAD DATAto shovel it into a temp tablerawso that the aboveINSERT..SELECTcan work. If it is of manageable size, makerawEngine=MEMORYto avoid any I/O for it.If you have multiple feeds, my high-speed-ingestion blog discusses how to have multiple threads without any deadlocks.