Can Lost Update happen in read committed isolation level in PostgreSQL?

3.6k views Asked by At

I have a query like below in PostgreSQL:

UPDATE 
     queue 
SET 
  queue.status   = 'PROCESSING' 
WHERE 
    queue.status   = 'WAITING' AND
    queue.id       = (SELECT id FROM queue WHERE STATUS = 'WAITING' LIMIT 1 )
RETURNING 
    queue.id

and many workers try to process one work at a time (that's why I have sub-query with limit 1). After this update, each worker grabs information about the id and processes the work, but sometimes they grab the same work and process it twice or more. The isolation level is Read Committed.

My question is how can I guarantee one work is going to be processed once? I know there is so many post out there but I can say I have tried most of them and it didn't help () ;

  • I have tried SELECT FOR UPDATE, but it caused deadlocked situation.
  • I have tried pg_try_advisory_xact_lock, but it caused out of shared memory
  • I tried adding AND pg_try_advisory_xact_lock(queue.id) to the outer query's WHERE clause, but ... [?]

Any help would be appreciated.

3

There are 3 answers

8
Craig Ringer On

A lost update won't occur in the situation you describe, but it won't work properly either.

What will happen in the example you've given above is that given (say) 10 workers started simultaneously, all 10 of them will execute the subquery and get the same ID. They will all attempt to lock that ID. One of them will succeed; the others will block on the first one's lock. Once the first backend commits or rolls back, the 9 others will race for the lock. One will get it, re-check the WHERE clause and see that the queue.status test no longer matches, and return without modifying any rows. The same will happen with the other 8. So you used 10 queries to do the work of one query.

If you fail to explicitly check the UPDATE result and see that zero rows were updated you might think you were getting lost updates, but you aren't. You just have a concurrency bug in your application caused by a misunderstanding of the order-of-execution and isolation rules. All that's really happening is that you're effectively serializing your backends so that only one at a time actually makes forward progress.

The only way PostgreSQL could avoid having them all get the same queue item ID would be to serialize them, so it didn't start executing query #2 until query #1 finished. If you want to you can do this by LOCKing the queue table ... but again, you might as well just have one worker then.

You can't get around this with advisory locks, not easily anyway. Hacks where you iterated down the queue using non-blocking lock attempts until you got the first lockable item would work, but would be slow and clumsy.

You are attempting to implement a work queue using the RDBMS. This will not work well. It will be slow, it will be painful, and getting it both correct and fast will be very very hard. Don't roll your own. Instead, use a well established, well tested system for reliable task queueing. Look at RabbitMQ, ZeroMQ, Apache ActiveMQ, Celery, etc. There's also PGQ from Skytools, a PostgreSQL-based solution.

Related:

0
tchelidze On

SKIP LOCKED can be used to implement queue in PostgreSql. see

1
Super Kai - Kazuya Ito On

In PostgreSQL, lost update happens in READ COMMITTED and READ UNCOMMITTED but if you use SELECT FOR UPDATE in READ COMMITTED and READ UNCOMMITTED, lost update doesn't happen.

In addition, lost update doesn't happen in REPEATABLE READ and SERIALIZABLE whether or not you use SELECT FOR UPDATE. *Error happens if there is a lost update condition.