Joblib parallel write to "shared" numpy sparse matrix

926 views Asked by At

Im trying to compute number of shared neighbors for each node of a very big graph (~1m nodes). Using Joblib Im trying to run it in parallel. But Im worrying about parallel writes to sparse matrix, which supposed to keep all data. Will this piece of code produce consistent results?

vNum = 1259084
NN_Matrix = csc_matrix((vNum, vNum), dtype=np.int8)

def nn_calc_parallel(node_id = None):
    i, j = np.unravel_index(node_id, (1259084, 1259084))
    NN_Matrix[i, j] = len(np.intersect1d(nx.neighbors(G, i), nx.neighbors(G,j)))

num_cores = multiprocessing.cpu_count()
result = Parallel(n_jobs=num_cores)(delayed(nn_calc_parallel)(i) for i in xrange(vNum**2))

If not, can you help me to solve this?

1

There are 1 answers

0
luky On

I needed to do the same work, in my case was just ok to merge the matrixes together into one matrix which you can do this way:

from scipy.sparse import vstack
matrixes = Parallel(n_jobs=-3)(delayed(nn_calc_parallel)(x) for x in documents) 
matrix = vstack(matrixes)

Njob-3 means all CPUS except 2, otherwise it might throw some memory errors.