CRF model trained on plural, not working on singular

102 views Asked by At

I have made a CRF model. My data set has 24 classes and at this time I am in beginning so my training data has just 1200 tokens/corpus. I have train the model. In my training data I have used the plural of tokens like addresses, photos, states, countries etc.

Now at the time of testing if I give plural of tokens in sentence form to this model then it work good but if I enter my sentence in singular like photo, state etc then it does not assign any tag to it.

This behavior of crf is looking very strange. I have explore the NER Feature Factory and used some lemma features but it also did not work. Sharing my austen.prop for the model formation.

# location of the training file
trainFile = training_data_for_ner.txt
# location where you would like to save (serialize) your
# classifier; adding .gz at the end automatically gzips the file,
# making it smaller, and faster to load
serializeTo = ner-model.ser.gz

# structure of your training file; this tells the classifier that
# the word is in column 0 and the correct answer is in column 1
map = word=0,answer=1,pos=2,lemma=3

# This specifies the order of the CRF: order 1 means that features
# apply at most to a class pair of previous class and current class
# or current class and next class.
maxLeft=1

# these are the features we'd like to train with
# some are discussed below, the rest can be
# understood by looking at NERFeatureFactory
useClassFeature=true
useWord=true
# word character ngrams will be included up to length 6 as prefixes
# and suffixes only 
useNGrams=true
noMidNGrams=true
maxNGramLeng=6
usePrev=true
useNext=true
useDisjunctive=true
useSequences=true
usePrevSequences=true
# the last 4 properties deal with word shape features
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
wordShape=chris2useLC
# newly added features.
useLemmas=true
usePrevNextLemmas=true
useLemmaAsWord=true
useTags=true

Last four features were added by reading that NER Feature Factory. If anyone can help me to solve this problem then I will be thankful to you.

1

There are 1 answers

4
dveim On

You should retrain it with stemmed tokens. See https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/process/Stemmer.java (main method) for example.