We have queues of Ticket objects persisted to the database. Our entity containing the queues looks like this:
@Entity
public class StoreQueueCollection {
@Id
private int storeId;
@ManyToMany(fetch=FetchType.LAZY)
@OrderColumn
private List<Ticket> mainQueue = new ArrayList<Ticket>();
@ManyToMany(fetch=FetchType.LAZY)
@OrderColumn
private List<Ticket> cancelledQueue = new ArrayList<Ticket>();
.. etc
We have one operation which moves a ticket from one queue to another (call this changeStatus), and another which adds a new ticket to the end of a queue (call this newTicket).
When the two operations interleave on the same queue, the operations basically work, but we end up with a "gap" in our queue. In the database, this looks like a missing index in the order column of the table, like this: 0, 1, 2, 4. When the queue is loaded back into Java, the missing index becomes a null element in the queue.
We are using pessimistic locking on the StoreQueueCollection object to prevent inconsistent interleaved updates, but it's not working as we would expect. With extra logging, we see strange sequences like this:
- changeTicketStatus() starts - lock the queue using entityManager.refresh() - ticket X removed from front of queue A - queue is entityManager.flush()ed * newTicket() starts * creates a new ticket Y, locks the StoreQueueCollection using entityManager.refresh(); * ticket Y added to the end of queue A * ticket Y has fields initialized and is save()d * call refresh(), method is blocked - changeTicketStatus() resumes (printing in-memory queue shows that ticket X is not in queue A) - ticket X added to another queue B - ticket X has some fields modified - ticket X is saved using repository.save() (printing in-memory queue shows that ticket X is not in queue A) - changeTicketStatus() completes * newTicket() resumes * refresh() returns (printing in-memory queue A shows that ticket X is still in queue!) * ticket Y is added to end of queue A (printing in-memory queue A shows that ticket X and Y are in the queue) * queues are save()d
All locks are LockModeType.PESSIMISTIC_WRITE and scope is PessimisticLockScope.EXTENDED.
After this sequence of execution, an assertion triggers from another thread which checks for null entries in the queue. Mysteriously, the queue is basically correct (X is deleted, Y is added to the end), but there is a gap in the order column before Y.
Any suggestions on what we're doing wrong are greatly appreciated!
Are you locking single rows, or are all of your threads locking a common 'queue' row? If the latter, keep in mind that you need to maintain a transaction for the entire time you want the lock to be held. If the former, locking single (different) rows won't do anything to prevent interleaving operations.
You didn't mention what backend you're using, but sometimes database logging tools (like SQL Server's profiler and activity monitor) can be helpful for determining what is going on, since you can get an ordered list of all SQL statements issued to the database.