This is a rough example of how I leverage multiprocessing with pathos:
from pathos.multiprocessing import ProcessingPool
pool = ProcessingPool(10)
results = pool.map(func, args)
Each func's run can take a while. Let's say it's 5 minutes, and len(args) == 20
.
In this case it'll take around 10 minutes to finish.
During this period ram usage steadily grows and the memory is only freed when all work is done.
The main question: how do I change the approach to free the memory each time a process is finished, instead of waiting for all of them to finish? Otherwise, if there are 100 args, the total ram consumption would be 5 times bigger than when there are 20 args, even though they are all being computed in parallel chunks of 10.
Besides, the reason behind memory growth is unclear. I allocate memory at the start of func
, but the usage grows with time. The return value of func is always 0, I store the results on disk.
Also, is there a way to have a few arrays to reside in a shared memory area? So that each process doesn't have to make its own copy.
I was making plots inside my
func
. To make those plots, I usedmatplotlib.pyplot
's interface. Turns out that matplotlib keeps references to all the figures created via pyplot. This is why ram wasn't released. So the issue has nothing to do withpathos
. Clearing the figures solved it.