I am trying to understand the result evaluation table (table 1) of this paper.
There are three different accuracies reported overall
, unknown words (UW)
, known words (KW)
, and percentage of unknown words (% unk.)
.
Are the known words
the data that is used for training? And, are the unknown words
the data that is used for testing and validation?
What is the overall accuracy? How is it computed?
What is the percentage of unknown words % unk.
? Is it the percentage of the test set?
Thank you.
There is a second table in this paper, which is a bit clearer:
So known words are word seen in training (or somehow seen before, in a lexicon for example). KW should be the accuracy of the tagger on those words (calculated on the test data, of course). On the other hand, words which did not appear in training data (unknown words) are more difficult for the tagger to label. Therefore the accuracy on those words is lower.
As you can see, the overall accuracy is calculated by the accuracy of the known words times their frequency plus the accuracy on the unknown words times their frequency:
overall = KW*(1 - %unk) + UW*(%unk)
( %unk needs to be divided by 100).This is somehow not true for the first table. I suspect this is because the authors did not measure those results themselves and and therefore there was a different way in calculating those measures. You could try reading the original papers to understand their calculation.