There are distributed computation nodes and there are set of computation tasks represented by rows in a database table (a row per task):
- A node has no information about other nodes: can't talk other nodes and doesn't even know how many other nodes there are
- Nodes can be added and removed, nodes may die and be restarted
- A node connected only to a database
- There is no limit of tasks per node
- Tasks pool is not finite, new tasks always arrive
- A node takes a task by marking that row with some timestamp, so that other nodes don't consider it until some timeout is passed after that timestamp (in case of node death and task not done)
The goal is to evenly distribute tasks among nodes. To achieve that I need to define some common algorithm of tasks acquisition: when a node starts, how many tasks to take?
If a node takes all available tasks, when one node is always busy and others idle. So it's not an option.
A good approach would be for each node to take tasks 1 by 1 with some delay. So each node periodically (once in some time) checks if there are free tasks and takes only 1 task. In this way, shortly after start all nodes acquire all tasks that are more or less equally distributed. However the drawback is that because of the delay, it would take some time to take last task into processing (say there are 10000 tasks, 10 nodes, delay is 1 second: it would take 10000 tasks * 1 second / 10 nodes = 1000 seconds from start until all tasks are taken). Also the distribution is non-deterministic and skew is possible.
Question: what kind/class of algorithms solve such problem, allowing quickly and evenly distribute tasks using some sync point (database in this case), without electing a leader?
For example: nodes use some table to announce what tasks they want to take, then after some coordination steps they achieve consensus and start processing, etc.
So this comes down to a few factors to consider.
My inclination is do the following:
2 * available_tasks / active_nodes - nodes_accepted_tasks
. So if there are 100 available tasks, 10 active nodes, and this node has accepted 5 task already. Then it would accept:100 / 10 - 5 = 5
tasks. If nodes only look for more tasks when they no longer have any tasks then you can just useavailable_tasks / active_nodes
.If node identifier is impractical. I would just say that each node should aim to take
ceil(sqrt(N))
random tasks, whereN
is the number of available tasks. If there are 100 tasks. The first node will take 10, the second will take 10, the 3rd will take 9, the 4th will take 9, the 5th will take 8, and so on. This won't evenly distribute all the tasks at once, but it will ensure the nodes get a roughly even number of tasks. The slight staggering of # of tasks means that the nodes will not all finish their tasks at the same time (which admittedly may or may not be desirable). By not fully distributing them (unless there are sqrt(N) nodes), it also reduces the likelihood of conflicts (especially if tasks are randomly selected) are reduced. It also reduces the number of "failed" tasks if a node goes down.This of course assumes that a node can request more tasks after it has started, if not, it makes it much more tricky.
As for an additional table, you could actually use that to keep track of the current status of the nodes. Each node records how many tasks it has, it's unique UUID and when it last completed a task. Though that may have potential issues with database churn. I think it's probably good enough to just record which node has accepted the task along with when it accepted it. This is again more useful if nodes can request tasks in the future.