I have around 200 candidate sentences and for each candidate, I want to measure the bleu score by comparing each sentence with thousands of reference sentences. These references are the same for all candidates. Here is how I'm doing it right now:
ref_for_all = [reference] *len(sents)
score = corpus_bleu(ref_for_all, [i.split() for i in sents], weights=(0, 1, 0, 0))
The reference
contains the whole corpus I want to compare each sentence with, and sent
are my sentences (candidates). Unfortunately, this takes too long and given the experimental nature of my code, I cannot wait that long to get the results. Is there any other way (for example using Regex) that I can get these scores faster? I also have this problem with Rouge, so any suggestion is highly appreciated for that too!
After searching and experimenting with different packages and measuring the time each one needed to calculate the scores, I found the nltk corpus bleu and PyRouge the most efficient ones. Just keep in mind that in each record, I had multiple hypotheses and that's why I calculate the means once for each record and This is how I did it for BLEU:
For ROUGE:
rouge = PyRouge(rouge_n=(1, 2), rouge_l=True, rouge_w=False, rouge_s=False, rouge_su=False)
Then for taking the mean of all:
I hope it helps anyone who's had this problem.