What is the most efficient way to utilize dask multiprocessing scheduler if data flow between tasks is big?

Question

What is the most efficient way to utilize dask multiprocessing scheduler if data flow between tasks is big?

607 views Asked by Alexander Reshytko At 12 December 2016 at 23:43

We have a dask compute graph (quite custom so we use dask delayed instead of collections). I've read in the docs that current scheduling policy is LIFO so that a worker process has big chances to get the data it has just computed for further steps down the graph. But as far as I understood task computation results are still (de)serialized to hard drive in even in this case.

So the question is how much performance gain would I get trying to keep as little tasks as possible down a single path of independent computations in a graph:

A) many small "map" tasks along each path

t --> t --> t -->...
                     some reduce stage
t --> t --> t -->...

B) one huge "map" task along for each path

   T ->
        some reduce stage
   T ->

Thank you!

Original Q&A

There are 1 answers

**MRocklin** · Accepted Answer · 2016-12-13T01:05:48+00:00

The dask multiprocessing scheduler will automatically fuse linear chains of tasks into single tasks, so your case A above will automatically become case B.

If your workloads are more complex and do require inter-node communication then you might want to try the distributed scheduler on a single computer. It manages data movement between workers more intelligently.

$ pip install dask distributed

>>> from dask.distributed import Client
>>> c = Client()  # Starts local "cluster".  Becomes the global scheduler

Correction

Also, just as a note, Dask doesn't persist intermediate results on disk. Rather it communicates intermediate results directly between processes.

TechQA.

What is the most efficient way to utilize dask multiprocessing scheduler if data flow between tasks is big?

There are 1 answers

Correction

Related Questions in PYTHON

Related Questions in PARALLEL-PROCESSING

Related Questions in DASK

Popular Questions

Popular Tags

Trending Questions