After using the dask multiprocessing scheduler for a long period of time, I noticed that the python processes started by the multiprocessing scheduler take a lot of memory. How can I restart the worker pool?
How to terminate workers started by dask multiprocessing scheduler?
2.1k views Asked by Arco Bast At
1
Update: You can do this to kill the workers started by the multiprocessing scheduler:
First answer:
For tasks that consume a lot of memory, I prefer to use the
distributed
scheduler even in localhost.It's very straightforward:
distributed.Client
class to submit your jobs.I found out this way more reliable than the default scheduler. I prefer explicitly submit the task and handle the future to use the progress widget, which is really nice in a notebook. Also you can still do stuff while waiting the results.
If you get errors due to memory issues, you can restart the workers or the scheduler (start all over again), use smaller chunks of data and try again.