I have a lot of entries in a table that are fetched for performing jobs. this is scaled to several servers.
when a server fetches a bunch of rows to add to its own job queue they should be "locked" so that no other server fetches them. when the update is performed a timestamp is increased and they are "unlocked".
i currently do this by updating a field that is called "jobserver" in the table that defaults to null with the id of the jobserver.
a job server only selects rows where the field is null.
when all rows are processed their timestamp is updated and finally the job field set to null again.
so i need to synchronize this:
$jobs = mysql_query("
SELECT itemId
FROM items
WHERE
jobserver IS NULL
AND
DATE_ADD(updated_at, INTERVAL 1 DAY) < NOW()
LIMIT 100
");
mysql_query("UPDATE items SET jobserver = 'current_job_server' WHERE itemId IN (".join(',',mysql_fetch_assoc($jobs)).")");
// do the update process in foreach loop
// update updated_at for each item and set jobserver to null
every server executes the above in an infinite loop. if no fields are returned, everything is up 2 date (last update is not longer ago than 24 hours) and is sent to 10 minutes.
I currently have MyIsam and i would like to stay with it because it had far better performance than innodb in my case, but i heard that innodb has ACID transactions.
So i could execute the select and update as one. but how would that look and work?
the problem is that i cannot afford to lock the table or something because other processes neeed to read/write and cannot be locked.
I am also open to a higher level solution like a shared semaphore etc. the problem is the synchronization needs to be across several servers.
is the approach generally sane? would you do it differently?
how can i synchronize the job selectino to ensure that two servers dont update the same rows?
You can run the UPDATE first but with the WHERE and LIMIT that you had on the SELECT. You then SELECT the rows that have the jobserver field set to your server.