I've been trying out the StanfordCoreNLP NER and everything manually on the website, and it seems they depend on very specific/proper English cues to detect entities, for example. When dealing with web text, though, where you could have some text like
John Doe
Assistant Professor of Computer Science
Stanford University
StanfordNLP seems to have some trouble (labeling the whole thing as one organization due to lack of prepositions/punctuation). Is there anything I can do to allow NER to better handle this kind of text (e.g. program some pre-processing of text)?
Adding dot(.) at the end of each line gives better results. (Since sentence splitter uses dot as delimeter)