I want to train the NER-Model by spaCy on my own corpus, which was annotated via WebAnno. Unfortunately, the notation of one NE category in spaCy does not match with the respective notation in WebAnno: In WebAnno, the label is "OTH" whereas spaCy labels it "MISC" (semantically, it's the same). Would this affect the training process or the test accuracy in a negative way? Is it necessary to train an additional NE type "OTH" in this case? Thank you for your help!

spaCy version used: 2.2.5

1

There are 1 answers

0
Matthias Winkelmann On BEST ANSWER

Yes, of course you want to keep annotations aligned. If it's a one-off operation, it might be easiest to brute-force the problem by replacing the string in your data.

The more canonical option would appear to be TagMap: https://spacy.io/usage/adding-languages#tag-map. Quote:

[...] you need to define how [your tags] map down to the Universal Dependencies tag set.

Their example:

from ..symbols import POS, NOUN, VERB, DET

TAG_MAP = {
    "NNS":  {POS: NOUN, "Number": "plur"},
    "VBG":  {POS: VERB, "VerbForm": "part", "Tense": "pres", "Aspect": "prog"},
    "DT":   {POS: DET}
}