Optimize MySQL on duplicate key update with conditions to update

97 views Asked by At

I have a working solution, but I'm not sure this might be the best solution.

My goal: there's a table (testtable) with a column status. When the current status is 'CXL' and a insert query is run, the statusUpdateDateTime should be newer than the existing value for that row and if so, for column status there should be an extra check. If the previous value was 'CXL' and the current value is 'ACPT', the value should stay at 'CXL', otherwise the value must be updated to the new value.

For the other columns, there's only a 'newer' check necessary.

below is the test table (in practice the unique key will contain multiple columns)

CREATE TABLE `testtable` (
  `rowid` int unsigned NOT NULL AUTO_INCREMENT,
  `mappedId` int unsigned NOT NULL,
  `status` varchar(10) NOT NULL,
  `statusupdatedatetime` datetime(1) NOT NULL,
  `message` varchar(256) DEFAULT NULL,
  `message_identifier` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`rowid`),
  UNIQUE KEY `unq_trade_reporting_status_table` (`mappedId`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=latin2

this will be my initial insert query:

INSERT IGNORE INTO `testtable` (`mappedId`,`status`,`statusupdatedatetime`,`message`,`message_identifier`) VALUES 
(1,'RJCT','2023-02-05 13:39:00.0','Rejected',''), 
(2,'ACPT','2023-02-05 13:39:00.0','Accepted',''),
(3,'PNDG','2023-02-05 13:40:00.0','Pending',''),
(4,'CXL','2023-02-05 13:50:00.0','Accepted','some message')
AS newData
ON DUPLICATE KEY UPDATE 
`testtable`.`status` = 
CASE WHEN newData.statusupdatedatetime > `testtable`.`statusupdatedatetime`
THEN 
(
    CASE WHEN IFNULL(STRCMP(`testtable`.`status`, 'CXL'),-1)=0  AND IFNULL(STRCMP(newData.`status`, 'ACPT'),-1)=0
    THEN 'CXL' 
    ELSE newData.`status`
    END
)
ELSE `testtable`.`status` END
,`testtable`.`message` = CASE WHEN newData.statusupdatedatetime > `testtable`.`statusupdatedatetime` THEN newData.`message` ELSE `testtable`.`message` END
,`testtable`.`message_identifier` = CASE WHEN newData.statusupdatedatetime > `testtable`.`statusupdatedatetime` THEN newData.`message_identifier` ELSE `testtable`.`message_identifier` END
,`testtable`.`statusupdatedatetime` = CASE WHEN newData.statusupdatedatetime > `testtable`.`statusupdatedatetime` THEN newData.`statusupdatedatetime` ELSE `testtable`.`statusupdatedatetime` END;

the row with mappedId=4 has a status value of 'CXL'.

when I run this query, the value stays the same (which is intended). Mind the higher date time and 'ACPT' value for status.

INSERT IGNORE INTO `testtable` (`mappedId`,`status`,`statusupdatedatetime`,`message`,`message_identifier`) VALUES 
(1,'RJCT','2023-02-05 13:39:00.0','Rejected',''), 
(2,'ACPT','2023-02-05 13:39:00.0','Accepted',''),
(3,'PNDG','2023-02-05 13:40:00.0','Pending',''),
(4,'ACPT','2023-02-05 13:51:00.0','Accepted','some message')
AS newData
ON DUPLICATE KEY UPDATE 
`testtable`.`status` = 
CASE WHEN newData.statusupdatedatetime > `testtable`.`statusupdatedatetime`
THEN 
(
    CASE WHEN IFNULL(STRCMP(`testtable`.`status`, 'CXL'),-1)=0  AND IFNULL(STRCMP(newData.`status`, 'ACPT'),-1)=0
    THEN 'CXL' 
    ELSE newData.`status`
    END
)
ELSE `testtable`.`status` END
,`testtable`.`message` = CASE WHEN newData.statusupdatedatetime > `testtable`.`statusupdatedatetime` THEN newData.`message` ELSE `testtable`.`message` END
,`testtable`.`message_identifier` = CASE WHEN newData.statusupdatedatetime > `testtable`.`statusupdatedatetime` THEN newData.`message_identifier` ELSE `testtable`.`message_identifier` END
,`testtable`.`statusupdatedatetime` = CASE WHEN newData.statusupdatedatetime > `testtable`.`statusupdatedatetime` THEN newData.`statusupdatedatetime` ELSE `testtable`.`statusupdatedatetime` END;

This all works fine, but is there a more efficient way of doing this?

regards,

Matthijs

1

There are 1 answers

0
Rick James On

Speed

IODKU does 2 steps for each given row. Your example has 4 rows.

  1. Fetch the row (based on any UNIQUE keys provided in the given row). This is like a SELECT and probably cannot be improved upon.

  2. Do one of these, depending on whether a row(s) was found. Since there is only one UNIQUE key involved, only mappedId is checked (rowid is also unique, but it is not listed).

  • If no row found, INSERT a new row.
  • If row(s) found, UPDATE them.

The CASE expressions (or any other expressions) take only a small amount of the overall time. Do not worry about optimizing them.

It seems like IFNULL( STRCMP(...), ...) = FALSE could be simplified to NOT STRCMP(...) or maybe STRCMP(...) IS NOT NULL. (or something like that)

Readability

Since everything seems to depend on

 newData.statusupdatedatetime > `testtable`.`statusupdatedatetime`

I would look for how to split the one statement into two, based on that. (Sorry, the query is too complex for me to wrap my head around and come up with a concrete suggestion.)