How to enable proper work stealing in dask.distributed when using task restrictions / worker resources?

468 views Asked by At

Context

I'm using dask.distributed to parallelise computations across machines. I therefore have dask-workers running on the different machines which connect to a dask-scheduler, to which I can then submit my custom graphs to together with the required keys.

Due to network mount restrictions, my input data (and output storage) is only available to a subset of the machines ('i/o-hosts'). I tried to deal with this in two ways:

  1. all tasks involved in i/o operations are restricted to i/o-hosts (they can run only on workers which run on machines that have access to the data) and non i/o tasks are restricted to non-i/o-hosts
  2. all tasks involved in i/o operations are bound to workers providing the resource 'io' (the i/o-hosts) and non i/o tasks are bound to workers on non-i/o-hosts which provide the resource 'compute'.

The idea behind not allowing non i/o tasks to run on i/o-hosts is to make sure their workers are available for i/o tasks.

Problem

Both approaches work as expected in the sense that they restrict i/o-tasks to the right workers. However I noticed that when using any of the two approaches, only very few workers accumulate a lot of tasks while most other workers remain idle.

After reading up on how tasks are distributed among the workers I found that work stealing seems to be intentionally disabled for restricted tasks (http://distributed.readthedocs.io/en/latest/work-stealing.html). This seems to also apply for the resources framework.

Question

Is there a good way to combine restrictions on tasks with work stealing?

0

There are 0 answers