It is a well-known pattern to use LPUSH
and BRPOPLPUSH
(http://redis.io/commands/rpoplpush) to implement a durable queue in Redis. However, in order to scale up, the design needs to cater for multiple workers/consumers that BRPOPLPUSH
from the main task queue.
So the norm seems to be that for each worker there is a separate processing_queue
that records the task a specific worker is working on, such that the worker keeps track of what is left to do in case that it exits during processing.
I have two questions regarding this processing_queue
:
Is it correct to reason that there will be at most one item/task at any time in a worker's
processing_queue
? I assume that a worker starts by checking any left-over task in its ownprocessing_queue
beforeBRPOPLPUSH
the main task queue. If so, we can use any ofRPOP
,LPOP
,LREM
to remove a task once the worker finishes processing (or it can simply delete the list). We can even use a set instead of a list. Is there any reason why so many people choose to useLREM
but nothing else?I have seen many people recommend to identify the individual
processing_queue
using the process ID of the corresponding worker. But what happens when the old worker exits and a new one gets spawned with a (most likely) new process ID. How does the new worker look up its predecessor'sprocessing_queue
to finish a possible left-over task? I plan to useSupervisor
to manage my worker processes, if that makes a difference.
That depends. If you're only interested in having the worker pull a job from the main queue, process it, rinse and repeat then yes. However, this isn't a hard requirement but rather a choice of design. For example, you can have the worker push additional items into the
processing_queue
when you want to serialize jobs that are derived from the main one.I'm not familiar with
Supervisor
but you can keep a separate data structure - a Sorted Set would be best here - with your workers' identifiers as elements and a timestamp as their score. Have your workers update their timestamps periodically and when you launch a new worker, have it check that set for workers who've been idle too long. When such are found, try ascertaining that it is in fact dead and then have the new worker take over the relevantprocessing_queue
(i.e.g. with aRENAME
).