Prevent parallel execution using a table lock (MySQL)

2.1k views Asked by At

I have a MySQL table called cronjobs which holds entires for every cronjob needed (e.g. delete old emails, update profile age, and so on). For every cronjob there is a defined code block which gets executed if the cronjob is due (I got different intervals for different cronjobs).

For the execution of the due cronjobs, I got a PHP script which is executed by UNIX crontab every minute (calls execute_cronjobs_due.sh which calls "php -f /path/to/file/execute_cronjobs_due.php").

When executing execute_cronjobs_due.php all cronjobs get marked that they are going to be executed, so that another call of execute_cronjobs_due.php wouldn't cause a parallel execution of the same cronjob getting already executed.

Now the problem: Sometimes the execution takes more than 60 seconds but the crontab program does not call execute_cronjobs_due.sh after these 60 seconds. What actually happens is that execute_cronjobs_due.sh is called right after the execution of the execution of the previous crontab. And if an execution takes more than 120 seconds, the next two executions are initialize simultaneously.

Timeline:

2015-06-15 10:00:00: execution execute_cronjobs_due.sh (takes 140 seconds)

2015-06-15 10:02:20: two simultaneous executions of execute_cronjobs_due.sh

Since it is executed exactly simultaneous, there is no use of marking the cronjob that they are being executed since the selects (which should actually exclude the marked once) are executed at the exact same time. So the update occurs right after both already selected the due cronjobs.

How can I solve this problem, so that there are no simultaneous executions of cronjobs? Can I use MySQL table locks?

Thank you very much for your help in advance,

Frederic

2

There are 2 answers

0
Mariusz Sakowski On BEST ANSWER

Yes you could use mysql table locks, but this may be overkill for your situation. Anyway to do that in most generic way

  1. Make sure that you have autocommit off
  2. LOCK TABLES cronjobs;
  3. do your stuff
  4. UNLOCK TABLES

for exact syntax and details read the docs obviusly https://dev.mysql.com/doc/refman/5.0/en/lock-tables.html , I personally never used table level locking so maybe there are some catches involved I am not aware of.

What I would do, if you use InnoDB table engine is to go with optimistic locking:

  1. start transaction as a first thing in your script
  2. get some id of script or whatever, might be process pid (getmypid()) or combination of host+pid. Or just generate guid if you don't know which will be perfect
  3. do something like UPDATE cronjobs SET executed_by = my_id WHERE executed_by is null and /* whatever condition to get jobs to run */
  4. then SELECT * FROM cronjobs where executed_by = my_pid
  5. do your stuff on whatever above select returned
  6. UPDATE cronjobs set executed_by = null where executed_by = my_pid

This should be as easy to do, easier to track what happens and scale in the future (i.e. you can have few instances running running in parallel as long as they execute different scripts)

With this solution second script will not fail (technically), it will just run 0 jobs.

Minus is that you will have to clean jobs that were claimed but script failed to mark them as finished, but you probably have to do it anyway with current solution. The easiest way would be to add a timestamp column that would track when was the job claimed last time and expire it after i.e. 15 minutes or an hour depending on business requirements (short pseudocode: first update will do SET executed_by = my_id, started_at = NOW() where executed_by is null or (executed_by is not null and started_at < NOW() - 1 hour))

0
Jens A. Koch On

How can I solve this problem, so that there are no simultaneous executions of cronjobs?

There are multiple ways to solve this. They might be helpful as-well:

My suggestion is to keep it simple and use either a file-locking or file-exist checking approach.

Can I use MySQL table locks?

Yes, but it's a bit overkill.

You would use a "cronjob processing table" with a cronjob status column ("ToDo, Started, Complete" or "Todo, Running, Done") and a PID column. Then you select jobs and mark their state by using transactions. That makes sure that "Selecting a job from Todo" and "marking it as running/started" is done in one step. In the end, you might still have multiple exec's of your "central cronjob processing script", but jobs are NOT selected multiple times for processing.