Correct NMT metrics with Fairseq on non-latin languages

36 views Asked by UpmostScarab At 11 September 2023 at 14:04

As you may know to calculate BLEU properly, you need to pass a tokenizer to it's parameters, in my example I'm working with Korean language, so I expect to pass --tokenize ko-meca to sacrebleu. I know that fairseq calculates bleu for translation task during validation steps, but I found no way to pass that option inside (and even opened an issue https://github.com/facebookresearch/fairseq/issues/5308).

Another option I considered was using cHRF since it's not dependant on tokenization, but as it seems form the code fairseq only uses bleu metric from sacrebleu.

I'm also aware that there's an option to compute bleu with your own tokenizer, but in that case the metric becomes tokenizer dependant, which I also don't want.

I would be grateful for any kind of suggestions on the matter.

Original Q&A

TechQA.

Correct NMT metrics with Fairseq on non-latin languages

There are 0 answers

Related Questions in PYTHON

Related Questions in BLEU

Related Questions in FAIRSEQ

Related Questions in NMT

Popular Questions

Popular Tags

Trending Questions