Python concurrent.futures - writing to a global variable in parent process

1.3k views Asked by At

I want to use concurrent.futures together with numpy to manipulate two scipy.sparse matrices:

matrix_A = scipy.sparse.lil_matrix((1000, 1000), dtype=np.float32) 
matrix_B = scipy.sparse.lil_matrix((500, 1000), dtype=np.float32) 

The algorithm works like this: every row in matrix_B has a one-to-many relationship to rows in matrix_A. For every row_B in matrix_B, I find its corresponding [row_A1, row_A2 ... row_An ] in matrix_A, sum them up and assign the sum to row_B.

def update_values(row):
    indices, values = find_rows_in_matrix_A(row)
    matrix_B[row, indices] = values

The matrices are large (10^7 rows), and I'd like to run this operation in parallel:

with concurrent.futures.ProcessPoolExecutor(max_workers=32) as executor:
     futures = {row : executor.submit(update_values, row) 
                for row in range(matrix_B.shape[0])}

But this doesn't work because changes made by child processes to global variables will be invisible to the parent process (as mentioned in this answer).

Another option would be to return the values from update_values, but that would require merging the values in the parent process, which takes too long for my use case.

Using multiprocessing.Manager.Array could be a solution, but that would create copies of the matrices at every write, and given their size, that's not an option.

Is there any way to make matrix_B writeable from child processes? Or what would be a better approach to this problem?

0

There are 0 answers