Why does Kenlm lm model keep returning the same score for different words?

387 views Asked by At

Why is the kenlm model returning the same values? I have tried it with a 4-gram arpa file as well. same issue.

import kenlm
model = kenlm.mode('lm/test.arpa') # unigram model. 

print( [f'{x[0]:.2f}, {x[1]}, {x[2]}' for x in model.full_scores('this is a sentence', bos=False, eos=False)])
print( [f'{x[0]:.2f}, {x[1]}, {x[2]}' for x in model.full_scores('this is a sentence1', bos=False, eos=False)])
print( [f'{x[0]:.2f}, {x[1]}, {x[2]}' for x in model.full_scores('this is a devil', bos=False, eos=False)])

Result:

['-2.00, 1, True', '-21.69, 1, False', '-1.59, 1, False', '-2.69, 1, True']

['-2.00, 1, True', '-21.69, 1, False', '-1.59, 1, False', '-2.69, 1, True']

['-2.00, 1, True', '-21.69, 1, False', '-1.59, 1, False', '-2.69, 1, True']

1

There are 1 answers

0
sourabh gupta On

Figured it out by myself.

The True/False in the output tells you whether a word is OOV (out of vocabulary) or not. The KenLM model assigns a fixed probability to these words. In the examples in the questions, all the last words are OOVs.