Deadlock while purging old records of spring batch job tables at scheduled time using shedlock

1.2k views Asked by At

I came across this challenging scenario wherein the transaction gets locked by Microsfoft sql server while executing a batch update query.

I see this error.

Transaction (Process ID 293) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.; nested exception is com.microsoft.sqlserver.jdbc.SQLServerException: Transaction (Process ID 293) was deadlocked on lock resources with another process and has been chosen as the deadlock victim.

My batch update query:

 jdbcTemplate.batchUpdate(purgeQueryArray)

I have 7-8 tables whose data needs to be purged if older than 7 days. Now, in case of lower environment since data volume is low, so it works fine. In production, we have data as much as 300k to 500k in each table. While deleting these many records, shedlock task which executes spring jdbc query ends up in deadlock. API for the same operation works fine but executed at different time so not sure of the load at the time scheduled task runs.

@Scheduled(cron = "${scheduler.expression}", zone = "GMT")
    @SchedulerLock(name = "SCHEDULER_LOCK", lockAtLeastFor = "10S", lockAtMostFor = "5M")
    public void purge() {
       // prepare array of queries purgeQueryArray
       jdbcTemplate.batchUpdate(purgeQueryArray)

    }

Shedlock table data:

SCHEDULER_LOCK 2020-10-21 00:00:15 2020-10-21 00:00:00 tomcat-406116080-2-521278230

Though I have given lockAtMostFor=5M , looks like lock_until shows 15 sec which is strange. can that be the reason ? Because operation will take 1-2 minute for the production volume of data.

Any suggestion will be very much appreciated

EDIT:

DELETE FROM BATCH_STEP_EXECUTION_CONTEXT WHERE STEP_EXECUTION_ID IN (SELECT BE.STEP_EXECUTION_ID FROM BATCH_STEP_EXECUTION BE 
join BATCH_STEP_EXECUTION_CONTEXT BEC on BE.STEP_EXECUTION_ID = BEC.STEP_EXECUTION_ID
where CAST(LAST_UPDATED as date) < DATEADD(day, -7, GETDATE()));

DELETE FROM BATCH_STEP_EXECUTION WHERE JOB_EXECUTION_ID IN (SELECT JOB_EXECUTION_ID FROM BATCH_JOB_EXECUTION
where CAST(LAST_UPDATED as date) < DATEADD(day, -7, GETDATE()));

DELETE FROM BATCH_JOB_EXECUTION_CONTEXT WHERE JOB_EXECUTION_ID IN (SELECT JOB_EXECUTION_ID FROM  BATCH_JOB_EXECUTION
where CAST(LAST_UPDATED as date) < DATEADD(day, -7, GETDATE()));

DELETE FROM BATCH_JOB_EXECUTION_PARAMS WHERE JOB_EXECUTION_ID IN (SELECT JOB_EXECUTION_ID FROM BATCH_JOB_EXECUTION
where CAST(LAST_UPDATED as date) < DATEADD(day, -7, GETDATE()));

DELETE FROM BATCH_JOB_EXECUTION WHERE CAST(LAST_UPDATED as date) < DATEADD(day, -7, GETDATE());

DELETE FROM BATCH_JOB_INSTANCE WHERE JOB_INSTANCE_ID NOT IN (SELECT JOB_INSTANCE_ID FROM BATCH_JOB_EXECUTION);

Thanks in advance

1

There are 1 answers

3
Randy in Marin On BEST ANSWER

Is LAST_UPDATED from BATCH_STEP_EXECUTION? (I like to put an alias on every column because it's hard to read otherwise.)

A deadlock will be more likely if it locks more or locks longer. It's important to optimize the query. And if that's not possible, then attempt to use small batch sizes to minimize the size of the transaction.

The first thing I would do is copy the data to a test environment. The first test I would try is to set the date back far enough to exclude any records. If it's slow, then it's doing a scan. A small batch size is not going to help - it might make it much worse.

The WHERE clauses with CAST(LAST_UPDATED as date) < DATEADD(day, -7, GETDATE())) covers the column and a table scan will result even if there is an index. Can you compare LAST_UPDATED directly to a local variable of exactly the same type?

Perhaps check the execution plan. It might indicate a problem.

Another option is to get the IDs first if they are stable enough. Then do the delete in a separate transaction using the table with the data to delete and perhaps a temp table. I would loop through the delete in batches.

If the ID an identity column or otherwise monotonically increasing, get the oldest ID to keep. That should be fast. Then delete all that have a smaller ID. (Be sure this is valid logic.)

Maybe avoid the IN clause? Is this the same result?

DELETE BEC
FROM BATCH_STEP_EXECUTION_CONTEXT BEC
INNER JOIN BATCH_STEP_EXECUTION BE 
ON BE.STEP_EXECUTION_ID = BEC.STEP_EXECUTION_ID
WHERE LAST_UPDATED < @LAST_UPDATED_LIMIT -- uncover the column if possible

Fun stuff. Good luck.