How to tackle Dask unmanaged memory in Windows OS when using delayed functions?

46 views Asked by At

I have the below traditional Python function, without any array-type flavour, but which I need to run many times. Hence, I used Dask-parallelization using dask.delayed. However, I can see a gradual accumulation of unmanaged memory.

I went through the following blog on Tackling unmanaged memory with Dask. I felt that the unmanaged memory was due to presence of many small objects - the Python objects. It was remaining even after the entire process had completed. However I could not test it, or use the manual trimming approach mentioned in the blog, as I am using a Windows OS.

I saw a related query being raised in Dask discussion forum, but no appropriate answers were found.

Below is a sample code which replicates the concept I have been implementing.

import numpy as np
import gc
import dask
from dask.distributed import Client

client = Client(n_workers=8)

# Naive function which is to be called multiple times. Please do not try to parallelize the
# internal body of the code based on its simplicity. The actual code is not that simple.
def native_function(N):
    curr_sample_1 = frozenset(np.random.default_rng().choice(N, size=500))
    curr_sample_2 = frozenset(np.random.default_rng().choice(N, size=500))
    sum_of_overlap = sum(curr_sample_1.intersection(curr_sample_2))
    
    del curr_sample_1, curr_sample_2  # To avoid increase in unmanaged memory via any "hoarding"
    gc.collect()

    return sum_of_overlap



def another_function(iters):
    list_of_overlap_sums = []
    for i in range(iters):
        native_func_result = dask.delayed(native_function)(10_000)
        list_of_overlap_sums.append(native_func_result)
    
    final_result = dask.delayed(sum)(list_of_overlap_sums).compute()
    
    client.cancel(native_func_result)  # Again to avoid accumulation of unmanaged memory
    client.run(gc.collect)

    return final_result


for i in range(100):
    print(another_function(100))

Below is the picture showing status of worker-memory at process completion. One can see that the process has stopped and most memory is due to unmanaged consumption. Although this is below threshold, in long run it can pose problems.

enter image description here

0

There are 0 answers