I have a MySQL table called cronjobs which holds entires for every cronjob needed (e.g. delete old emails, update profile age, and so on). For every cronjob there is a defined code block which gets executed if the cronjob is due (I got different intervals for different cronjobs).
For the execution of the due cronjobs, I got a PHP script which is executed by UNIX crontab every minute (calls execute_cronjobs_due.sh which calls "php -f /path/to/file/execute_cronjobs_due.php").
When executing execute_cronjobs_due.php all cronjobs get marked that they are going to be executed, so that another call of execute_cronjobs_due.php wouldn't cause a parallel execution of the same cronjob getting already executed.
Now the problem: Sometimes the execution takes more than 60 seconds but the crontab program does not call execute_cronjobs_due.sh after these 60 seconds. What actually happens is that execute_cronjobs_due.sh is called right after the execution of the execution of the previous crontab. And if an execution takes more than 120 seconds, the next two executions are initialize simultaneously.
Timeline:
2015-06-15 10:00:00: execution execute_cronjobs_due.sh (takes 140 seconds)
2015-06-15 10:02:20: two simultaneous executions of execute_cronjobs_due.sh
Since it is executed exactly simultaneous, there is no use of marking the cronjob that they are being executed since the selects (which should actually exclude the marked once) are executed at the exact same time. So the update occurs right after both already selected the due cronjobs.
How can I solve this problem, so that there are no simultaneous executions of cronjobs? Can I use MySQL table locks?
Thank you very much for your help in advance,
Frederic
Yes you could use mysql table locks, but this may be overkill for your situation. Anyway to do that in most generic way
for exact syntax and details read the docs obviusly https://dev.mysql.com/doc/refman/5.0/en/lock-tables.html , I personally never used table level locking so maybe there are some catches involved I am not aware of.
What I would do, if you use InnoDB table engine is to go with optimistic locking:
getmypid()
) or combination of host+pid. Or just generate guid if you don't know which will be perfectUPDATE cronjobs SET executed_by = my_id WHERE executed_by is null and /* whatever condition to get jobs to run */
SELECT * FROM cronjobs where executed_by = my_pid
UPDATE cronjobs set executed_by = null where executed_by = my_pid
This should be as easy to do, easier to track what happens and scale in the future (i.e. you can have few instances running running in parallel as long as they execute different scripts)
With this solution second script will not fail (technically), it will just run 0 jobs.
Minus is that you will have to clean jobs that were claimed but script failed to mark them as finished, but you probably have to do it anyway with current solution. The easiest way would be to add a timestamp column that would track when was the job claimed last time and expire it after i.e. 15 minutes or an hour depending on business requirements (short pseudocode: first update will do
SET executed_by = my_id, started_at = NOW() where executed_by is null or (executed_by is not null and started_at < NOW() - 1 hour)
)