NLTK's sentence_nist() returns ZeroDivisionError when the hypothesis & reference are the same. Here is my code:
from nltk.translate.nist_score import sentence_nist
ref_test = '太少'
hypo_test = '太少'
split_ref = ' '.join(splitKeyword(ref_test))
split_hypo = ' '.join(splitKeyword(hypo_test))
# Case 1: tokenized each Chinese character with space
nist_test = sentence_nist([split_ref], split_hypo) # ZeroDivisionError: division by zero
print(nist_test)
# Case 2: without splitting up the Chinese characters
nist_test = sentence_nist([ref_test], hypo_test) # ZeroDivisionError: division by zero
print(nist_test)
both results have the same error:
ZeroDivisionError: division by zero
I expect the NIST calculation will have high scores when the hypothesis and reference are the same.
How can I correctly calculate the NIST score of the Chinese sentence pair above?
Detailed Trackback
---------------------------------------------------------------------------
ZeroDivisionError Traceback (most recent call last)
Cell In[41], line 6
4 split_ref = ' '.join(splitKeyword(ref_test))
5 split_hypo = ' '.join(splitKeyword(hypo_test))
----> 6 nist_test = sentence_nist([split_ref], split_hypo)
7 #nist_test = sentence_nist([ref_test], hypo_test)
8 print(nist_test)
File ~/.local/lib/python3.10/site-packages/nltk/translate/nist_score.py:70, in sentence_nist(references, hypothesis, n)
18 def sentence_nist(references, hypothesis, n=5):
19 """
20 Calculate NIST score from
21 George Doddington. 2002. "Automatic evaluation of machine translation quality
(...)
68 :type n: int
69 """
---> 70 return corpus_nist([references], [hypothesis], n)
File ~/.local/lib/python3.10/site-packages/nltk/translate/nist_score.py:165, in corpus_nist(list_of_references, hypotheses, n)
162 nist_precision = 0
163 for i in nist_precision_numerator_per_ngram:
164 precision = (
--> 165 nist_precision_numerator_per_ngram[i]
166 / nist_precision_denominator_per_ngram[i]
167 )
168 nist_precision += precision
169 # Eqn 3 in Doddington(2002)