We have a dask compute graph (quite custom so we use dask delayed instead of collections). I've read in the docs that current scheduling policy is LIFO so that a worker process has big chances to get the data it has just computed for further steps down the graph. But as far as I understood task computation results are still (de)serialized to hard drive in even in this case.
So the question is how much performance gain would I get trying to keep as little tasks as possible down a single path of independent computations in a graph:
A) many small "map" tasks along each path
t --> t --> t -->...
some reduce stage
t --> t --> t -->...
B) one huge "map" task along for each path
T ->
some reduce stage
T ->
Thank you!
The dask multiprocessing scheduler will automatically fuse linear chains of tasks into single tasks, so your case A above will automatically become case B.
If your workloads are more complex and do require inter-node communication then you might want to try the distributed scheduler on a single computer. It manages data movement between workers more intelligently.
Further reading
Correction
Also, just as a note, Dask doesn't persist intermediate results on disk. Rather it communicates intermediate results directly between processes.