Using Word2vec (skip-gram) model in tensorflow , I wrote the code to obtain word embeddings from document-set. The final embeddings are in numpy.ndarray format
Now to obtain similar documents , I need to use the WMD(Word Movers Distance) algorithm.
(I don't have much knowledge of gensim) The gensim.similarities.WmdSimilarity() requires the embeddings to be in KeyedVectors data type (seems like) -- What can I do to implement WMD in my code.I have a tight deadline and can't give much time to writing the code of WMD from scratch .
If you're looking for similarity between 2 words, use
my_gensim_word2vec_modelis the gensim model, of course, not your own tensorflow model.If you want the most similar to a bunch of words:
my_gensim_word2vec_model.most_similar(positive=['king', 'queen', 'rabbit'])Check the gensim docs
If your're looking for similarity between sentences or documents, you're better off using doc2vec which gives a vector for all the vocabulary words and documents.
Or take the average of all words in the sentence/document to get the vector for that document. Then get the cosine similarity between the averages of the two sentences to be compared.
For example:
(Your question is unclear. What is document set? What is your task?) Hope this helps.