NLTK's sentence_nist() returns ZeroDivisionError when the hypothesis & reference are the same

37 views Asked by At

NLTK's sentence_nist() returns ZeroDivisionError when the hypothesis & reference are the same. Here is my code:

from nltk.translate.nist_score import sentence_nist

ref_test = '太少'
hypo_test = '太少'
split_ref = ' '.join(splitKeyword(ref_test))
split_hypo = ' '.join(splitKeyword(hypo_test))

# Case 1: tokenized each Chinese character with space
nist_test = sentence_nist([split_ref], split_hypo) # ZeroDivisionError: division by zero
print(nist_test)

# Case 2: without splitting up the Chinese characters
nist_test = sentence_nist([ref_test], hypo_test) # ZeroDivisionError: division by zero
print(nist_test)

both results have the same error:

ZeroDivisionError: division by zero

I expect the NIST calculation will have high scores when the hypothesis and reference are the same.

How can I correctly calculate the NIST score of the Chinese sentence pair above?

Detailed Trackback

---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
Cell In[41], line 6
      4 split_ref = ' '.join(splitKeyword(ref_test))
      5 split_hypo = ' '.join(splitKeyword(hypo_test))
----> 6 nist_test = sentence_nist([split_ref], split_hypo)
      7 #nist_test = sentence_nist([ref_test], hypo_test)
      8 print(nist_test)

File ~/.local/lib/python3.10/site-packages/nltk/translate/nist_score.py:70, in sentence_nist(references, hypothesis, n)
     18 def sentence_nist(references, hypothesis, n=5):
     19     """
     20     Calculate NIST score from
     21     George Doddington. 2002. "Automatic evaluation of machine translation quality
   (...)
     68     :type n: int
     69     """
---> 70     return corpus_nist([references], [hypothesis], n)

File ~/.local/lib/python3.10/site-packages/nltk/translate/nist_score.py:165, in corpus_nist(list_of_references, hypotheses, n)
    162 nist_precision = 0
    163 for i in nist_precision_numerator_per_ngram:
    164     precision = (
--> 165         nist_precision_numerator_per_ngram[i]
    166         / nist_precision_denominator_per_ngram[i]
    167     )
    168     nist_precision += precision
    169 # Eqn 3 in Doddington(2002)
0

There are 0 answers