StanfordNLP POS giving mixed results

77 views Asked by At

I was testing Stanford NLP POS Tagger, I am getting mixed results.

SOP(StanfordNLP.getInstance().getPOSMap("WHEAT flour(whole)".toLowerCase()));
SOP(StanfordNLP.getInstance().getPOSMap("Whole wheat flour".toLowerCase()));

Gives me the following output

{NN=[wheat, flour, whole]}
{JJ=[whole], NN=[wheat, flour]}

How can I deal with problems like these? Its actually the same words rearranged.

EDIT

Maybe, I should explain the problem.

I want to compare 2 sentences. My approach is perform POS on both string and then compare and score individually Nouns/Adjectives/Verbs from both strings.

But because of fuzzy tagging (as also reffered to by @Elliott) based on order of words, my ranking fails in some cases. Can someone suggest a workaround?

Is there a classification statistics which gives the probability of a Noun classified as Adjective or Verb etc, that i can use in my scoring algo to provide weights?

thanks Chahat

2

There are 2 answers

1
Elliott Beach On

POS taggers always give mixed results; the POS tagging is contextual since a word can be a noun, adjective, or verb in different contexts. The AI component of POS tagging decides how to tag words based on their order in the sentence.

2
stealthyK On

Stanford POS Tagger is pretty good. If however you want to easily see side by side comparisons with standard NLTK and other quality tagger called Senna you could try this: https://github.com/StealthyK/TaggerTimer