using dask distributed computing via jupyter notebook

1.8k views Asked by At

I am seeing strange behavior from dask when using it from jupyter notebook. So I am initiating a local client and giving it a list of jobs to do. My real code is a bit complex so I am putting a simple example for you here:

from dask.distributed import Client

def inc(x):
 return x + 1

if __name__ == '__main__':
 c = Client()
 futures = [c.submit(inc, i) for i in range(1,10)]
 result = c.gather(futures)
 print(len(result))

The problem is that, I realize that: 1. Dask initiates more than 9 processes for this example. 2. After the code has ran and it is done (nothing in the notebook is running), the processes created by dask are not killed (and the client is not shutdown). When I do a top, I can see all those processes still alive.

I saw in the documents that there is a client.close() option, but interestingly enough, such a functionality does not exist in 0.15.2.

The only time that the dask processes are killed, is when I stop the jupyter notebook. This issue is causing strange and unpredictable performance behavior. Is there anyway that the processes can get killed or the client shutdown when there is no code running on the notebook?

1

There are 1 answers

3
mdurant On

The default Client allows for optional parameters which are passed to LocalCluster (see the docs) and allow you to specify, for example, the number of processes you wish. Also, it is a context manager, which will close itself and end processes when you are done.

with Client(n_workers=2) as c:
    futures = [c.submit(inc, i) for i in range(1,10)]
    result = c.gather(futures)
    print(len(result))

When this ends, the processes will be terminated.