CRF model trained on plural, not working on singular

Question

CRF model trained on plural, not working on singular

102 views Asked by Hammad Hassan At 29 December 2016 at 13:18

I have made a CRF model. My data set has 24 classes and at this time I am in beginning so my training data has just 1200 tokens/corpus. I have train the model. In my training data I have used the plural of tokens like addresses, photos, states, countries etc.

Now at the time of testing if I give plural of tokens in sentence form to this model then it work good but if I enter my sentence in singular like photo, state etc then it does not assign any tag to it.

This behavior of crf is looking very strange. I have explore the NER Feature Factory and used some lemma features but it also did not work. Sharing my austen.prop for the model formation.

# location of the training file
trainFile = training_data_for_ner.txt
# location where you would like to save (serialize) your
# classifier; adding .gz at the end automatically gzips the file,
# making it smaller, and faster to load
serializeTo = ner-model.ser.gz

# structure of your training file; this tells the classifier that
# the word is in column 0 and the correct answer is in column 1
map = word=0,answer=1,pos=2,lemma=3

# This specifies the order of the CRF: order 1 means that features
# apply at most to a class pair of previous class and current class
# or current class and next class.
maxLeft=1

# these are the features we'd like to train with
# some are discussed below, the rest can be
# understood by looking at NERFeatureFactory
useClassFeature=true
useWord=true
# word character ngrams will be included up to length 6 as prefixes
# and suffixes only 
useNGrams=true
noMidNGrams=true
maxNGramLeng=6
usePrev=true
useNext=true
useDisjunctive=true
useSequences=true
usePrevSequences=true
# the last 4 properties deal with word shape features
useTypeSeqs=true
useTypeSeqs2=true
useTypeySequences=true
wordShape=chris2useLC
# newly added features.
useLemmas=true
usePrevNextLemmas=true
useLemmaAsWord=true
useTags=true

Last four features were added by reading that NER Feature Factory. If anyone can help me to solve this problem then I will be thankful to you.

Original Q&A

There are 1 answers

**dveim** · Answer 1 · 2016-12-29T13:24:50+00:00

dveim On 29 December 2016 at 13:24

You should retrain it with stemmed tokens. See https://github.com/stanfordnlp/CoreNLP/blob/master/src/edu/stanford/nlp/process/Stemmer.java (main method) for example.

TechQA.

CRF model trained on plural, not working on singular

There are 1 answers

Related Questions in NLP

Related Questions in STANFORD-NLP

Related Questions in CRF

Popular Questions

Popular Tags

Trending Questions