How to reduce Ipython parallel memory usage

602 views Asked by At

I'm using Ipython parallel in an optimisation algorithm that loops a large number of times. Parallelism is invoked in the loop using the map method of a LoadBalancedView (twice), a DirectView's dictionary interface and an invocation of a %px magic. I'm running the algorithm in an Ipython notebook.

I find that the memory consumed by both the kernel running the algorithm and one of the controllers increases steadily over time, limiting the number of loops I can execute (since available memory is limited).

Using heapy, I profiled memory use after a run of about 38 thousand loops:

Partition of a set of 98385344 objects. Total size = 18016840352 bytes.  
 Index  Count     %       Size   %  Cumulative   % Kind (class / dict of class)
     0  5059553   5 9269101096  51  9269101096  51 IPython.parallel.client.client.Metadata
     1 19795077  20 2915510312  16 12184611408  68 list
     2 24030949  24 1641114880   9 13825726288  77 str
     3  5062764   5 1424092704   8 15249818992  85 dict (no owner)
     4 20238219  21  971434512   5 16221253504  90 datetime.datetime
     5   401177   0  426782056   2 16648035560  92 scipy.optimize.optimize.OptimizeResult
     6        3   0  402654816   2 17050690376  95 collections.defaultdict
     7  4359721   4  323814160   2 17374504536  96 tuple
     8  8166865   8  196004760   1 17570509296  98 numpy.float64
     9  5488027   6  131712648   1 17702221944  98 int 
<1582 more rows. Type e.g. '_.more' to view.>

You can see that about half the memory is used by IPython.parallel.client.client.Metadata instances. A good indicator that results from the map invocations are being cached is the 401177 OptimizeResult instances, the same number as the number of optimize invocations via lbview.map - I am not caching them in my code.

Is there a way I can control this memory usage on both the kernel and the Ipython parallel controller (who'se memory consumption is comparable to the kernel)?

1

There are 1 answers

0
drevicko On

Ipython parallel clients and controllers store past results and other metadata from past transactions.

The IPython.parallel.Client class provides a method for clearing this data:

Client.purge_everything()

documented here. There is also purge_results() and purge_local_results() methods that give you some control over what gets purged.