Getting probability of the text given word embedding model in gensim word2vec model

931 views Asked by At

I am trying to get most probable sequence of word using gensim word2vec model. I have found a pretrained model which provides these files:

word2vec.bin
word2vec.bin.syn0.npy
word2vec.bin.syn1neg.npy

This is my code trying to get the probability of the sentence with this model:

model = model.wv.load(word_embedding_model_path)
model.hs = 1
model.negative = 0
print model.score(sentence.split(" "))

While running this code I am getting this error:

AttributeError: 'Word2Vec' object has no attribute 'syn1'

Can anyone help me figure out how to solve the problem. In general, I want to use some pretrained model to get the probability of sequence of word appearing together.

1

There are 1 answers

0
gojomo On

You can't toggle a model from using negative-sampling (eg negative=5, hs=0) to using hierarchical-softmax (eg hs=1, negative=0) after initial setup and training. The two models use different internal properties, that are only created by setup & training. (For example, the property syn1 only exists in a model that was created & trained in hierarchical-softmax mode.)

Since the score() method is currently only functional for HS models, you'd need to only use it with models that were trained in that mode.

(Note also that a value from score() of a single text, against a single model, isn't interpretable as an absolute probability. It's only in comparison against the scores of other texts against the same model, or the same text against alternate models, that the relative value of the score becomes meaningful.)