I am trying to find the exact list of tag set used in the Hebrew treebank used by Stanford NLP. Finding this tag set seems to be harder than finding a POS tagger :)
Are there any tools for reading the tag set used for training a (Penn?) tree bank?
I am trying to find the exact list of tag set used in the Hebrew treebank used by Stanford NLP. Finding this tag set seems to be harder than finding a POS tagger :)
Are there any tools for reading the tag set used for training a (Penn?) tree bank?
For the stanfordnlp python package, for all languages, the POS tag set used is the Universal Dependencies (UD) v2 tag set. Some UD corpora also include an original POS tag set, which is often more fine-grained. But while the Hebrew Treebank was originally built with its own POS tag set, and was then coverted to UD, it seems like the supplied version in the UD repository comes only with the UD tag set. Individual languages may use only a subset of the UD POS tag set. You can find details of that on the Treebank hub page for the Hebrew TreeBank. You'll see there that 15 of the 17 UD POS tags are used.