Adapting StanfordCoreNLP to process noisy web text?

156 views Asked by At

I've been trying out the StanfordCoreNLP NER and everything manually on the website, and it seems they depend on very specific/proper English cues to detect entities, for example. When dealing with web text, though, where you could have some text like


John Doe

Assistant Professor of Computer Science

Stanford University


StanfordNLP seems to have some trouble (labeling the whole thing as one organization due to lack of prepositions/punctuation). Is there anything I can do to allow NER to better handle this kind of text (e.g. program some pre-processing of text)?

1

There are 1 answers

0
Vanaja Jayaraman On

Adding dot(.) at the end of each line gives better results. (Since sentence splitter uses dot as delimeter)