Recommend algorithm of fair distributed resources allocation consensus

116 views Asked by At

There are distributed computation nodes and there are set of computation tasks represented by rows in a database table (a row per task):

  • A node has no information about other nodes: can't talk other nodes and doesn't even know how many other nodes there are
  • Nodes can be added and removed, nodes may die and be restarted
  • A node connected only to a database
  • There is no limit of tasks per node
  • Tasks pool is not finite, new tasks always arrive
  • A node takes a task by marking that row with some timestamp, so that other nodes don't consider it until some timeout is passed after that timestamp (in case of node death and task not done)

The goal is to evenly distribute tasks among nodes. To achieve that I need to define some common algorithm of tasks acquisition: when a node starts, how many tasks to take?

If a node takes all available tasks, when one node is always busy and others idle. So it's not an option.

A good approach would be for each node to take tasks 1 by 1 with some delay. So each node periodically (once in some time) checks if there are free tasks and takes only 1 task. In this way, shortly after start all nodes acquire all tasks that are more or less equally distributed. However the drawback is that because of the delay, it would take some time to take last task into processing (say there are 10000 tasks, 10 nodes, delay is 1 second: it would take 10000 tasks * 1 second / 10 nodes = 1000 seconds from start until all tasks are taken). Also the distribution is non-deterministic and skew is possible.

Question: what kind/class of algorithms solve such problem, allowing quickly and evenly distribute tasks using some sync point (database in this case), without electing a leader?

For example: nodes use some table to announce what tasks they want to take, then after some coordination steps they achieve consensus and start processing, etc.

1

There are 1 answers

0
Nuclearman On

So this comes down to a few factors to consider.

  1. How many tasks are currently available overall?
  2. How many tasks are currently accepted overall?
  3. How many tasks has the node accepted in the last X minutes.
  4. How many tasks has the node completed in the last X minutes.
  5. Can the row fields be modified? (A field added).
  6. Can a node request more tasks after it has finished it's current tasks or must all tasks be immediately distributed?

My inclination is do the following:

  1. If practical, add a "node identifier" field (UUID) to the table with the rows. A node when ran generates a UUID node identifier. When it accepts a task it adds a timestamp and it's UUID. This easily let's other nodes be able to determine how many "active" nodes there are.
  2. To determine allocation, the node determines how many tasks are available/accepted. it then notes how many many unique node identifiers (including itself) have accepted tasks. It then uses this formula to accept more tasks (ideally at random to minimize the chance of competition with other nodes). 2 * available_tasks / active_nodes - nodes_accepted_tasks. So if there are 100 available tasks, 10 active nodes, and this node has accepted 5 task already. Then it would accept: 100 / 10 - 5 = 5 tasks. If nodes only look for more tasks when they no longer have any tasks then you can just use available_tasks / active_nodes.
  3. To avoid issues, there should be a max number of tasks that a node will accept at once.

If node identifier is impractical. I would just say that each node should aim to take ceil(sqrt(N)) random tasks, where N is the number of available tasks. If there are 100 tasks. The first node will take 10, the second will take 10, the 3rd will take 9, the 4th will take 9, the 5th will take 8, and so on. This won't evenly distribute all the tasks at once, but it will ensure the nodes get a roughly even number of tasks. The slight staggering of # of tasks means that the nodes will not all finish their tasks at the same time (which admittedly may or may not be desirable). By not fully distributing them (unless there are sqrt(N) nodes), it also reduces the likelihood of conflicts (especially if tasks are randomly selected) are reduced. It also reduces the number of "failed" tasks if a node goes down.

This of course assumes that a node can request more tasks after it has started, if not, it makes it much more tricky.

As for an additional table, you could actually use that to keep track of the current status of the nodes. Each node records how many tasks it has, it's unique UUID and when it last completed a task. Though that may have potential issues with database churn. I think it's probably good enough to just record which node has accepted the task along with when it accepted it. This is again more useful if nodes can request tasks in the future.