Context
I'm using dask.distributed to parallelise computations across machines. I therefore have dask-workers running on the different machines which connect to a dask-scheduler, to which I can then submit my custom graphs to together with the required keys.
Due to network mount restrictions, my input data (and output storage) is only available to a subset of the machines ('i/o-hosts'). I tried to deal with this in two ways:
- all tasks involved in i/o operations are restricted to i/o-hosts (they can run only on workers which run on machines that have access to the data) and non i/o tasks are restricted to non-i/o-hosts
- all tasks involved in i/o operations are bound to workers providing the resource 'io' (the i/o-hosts) and non i/o tasks are bound to workers on non-i/o-hosts which provide the resource 'compute'.
The idea behind not allowing non i/o tasks to run on i/o-hosts is to make sure their workers are available for i/o tasks.
Problem
Both approaches work as expected in the sense that they restrict i/o-tasks to the right workers. However I noticed that when using any of the two approaches, only very few workers accumulate a lot of tasks while most other workers remain idle.
After reading up on how tasks are distributed among the workers I found that work stealing seems to be intentionally disabled for restricted tasks (http://distributed.readthedocs.io/en/latest/work-stealing.html). This seems to also apply for the resources framework.
Question
Is there a good way to combine restrictions on tasks with work stealing?