Im trying to compute number of shared neighbors for each node of a very big graph (~1m nodes). Using Joblib Im trying to run it in parallel. But Im worrying about parallel writes to sparse matrix, which supposed to keep all data. Will this piece of code produce consistent results?
vNum = 1259084
NN_Matrix = csc_matrix((vNum, vNum), dtype=np.int8)
def nn_calc_parallel(node_id = None):
i, j = np.unravel_index(node_id, (1259084, 1259084))
NN_Matrix[i, j] = len(np.intersect1d(nx.neighbors(G, i), nx.neighbors(G,j)))
num_cores = multiprocessing.cpu_count()
result = Parallel(n_jobs=num_cores)(delayed(nn_calc_parallel)(i) for i in xrange(vNum**2))
If not, can you help me to solve this?
I needed to do the same work, in my case was just ok to merge the matrixes together into one matrix which you can do this way:
Njob-3 means all CPUS except 2, otherwise it might throw some memory errors.