I'm working with some code that I plan on running on a server in the near future. Right now it works on my local machine, but multiple people will be running the program at the same time. I'm worried that they will use more ram or vram than available. If I use dask will it wait for available resources before executing the function call?

Example Code

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
from numba import njit
import numpy as np
from dask.distributed import Client, LocalCluster

@njit
def addingNumbers (big_array, big_array2, save_array):
    for i in range (big_array.shape[0]):
        for j in range (big_array.shape[1]):
            save_array[i][j] = big_array[i][j] * big_array2[i][j]

    return save_array


if __name__ == "__main__":
    cluster = LocalCluster()
    client = Client(cluster)


    big_array = np.random.random_sample((100, 3000))
    big_array2  = np.random.random_sample((100, 3000))
    save_array = np.zeros(shape=(100, 3000))


    x = client.submit(addingNumbers, big_array, big_array2, save_array)
    y = client.gather(x)

If multiple people were running the above code at the same time and the server was almost out of ram, would dask wait until ram was available to submit the function, or would it submit it and the server would get an out of memory error?

If dask doesn't wait till ram is available, how would you queue the function call? Thanks

1 Answers

2
MRocklin On Best Solutions

If I use dask will it wait for available resources before executing the function call?

Dask is unable to predict how much RAM your function will need. However, you can set a memory limit on stored data and if Dask reaches that limit then it will stop running tasks once it reaches that limit and instead push some to disk. See https://distributed.dask.org/en/latest/worker.html#memory-management

how would you queue the function call?

The simplest solution would be to limit the number of active threads in a worker, or to use Worker resources to limit concurrency of only certain tasks per worker.