Understanding ROUGE vs BLEU

1.5k views Asked by At

I am looking into metrics for measuring the quality of text-summarization. For this, I have found this SO answer which states:

Bleu measures precision: how much the words (and/or n-grams) in the machine generated summaries appeared in the human reference summaries.

Rouge measures recall: how much the words (and/or n-grams) in the human reference summaries appeared in the machine generated summaries.

Although in this answer of SE I find this:

ROUGE-n recall=40% means that 40% of the n-grams in the reference summary are also present in the generated summary.

ROUGE-n precision=40% means that 40% of the n-grams in the generated summary are also present in the reference summary.

ROUGE-n F1-score=40% is more difficult to interpret, like any F1-score.

This is contradictory. Its sounds like Rouge-Precision is equal to BLEU and Rouge-Recall is equal to the statement made in the SO answer. Is Rouge-Precision the same as BLEU as it implemented BLEU?

It is also stated in the paper:

It is clear that ROUGE-N is a recall-related measure because the denominator of the equation is the total sum of the number of n-grams occurring at the reference summary side. A closely related measure, BLEU, used in automatic evaluation of machine translation, is a precision-based measure.

I dont understand this, as (atleast) rouge returns a precision and a recall value. Can anybody bring some clearness into this? Thank you!

0

There are 0 answers