I wrote a piece of code but I am not sure if we can get rid of the loops and vectorize it to make it faster. Can you please give suggestions? I am just updating the co-occurence matrix .
M = np.zeros((num_words,num_words))
word2Ind = {words[i]:i for i in range(len(words))}
for document in corpus:
for i,word in enumerate(document):
for j in range(i - window_size ,i + window_size + 1):
if i != j and j >= 0 and j <= len(document) - 1:
M[word2Ind[document[i]],word2Ind[document[j]]] += 1
You could at least, since the only thing you use
word2indfor is in piecesword2int[document[?]]start with computing index for your document once for all, and then work from those indexIt becomes then easier to slighly vecorize