Stanford CoreNLP library doesn't tokenize new lines

38 views Asked by At

I'm trying to perform tokenization operation with the following code:

private String[] nlpProperties = {"annotators","tokenize","tokenize.verbose","true","tokenize.options","tokenizeNLs=true"};
Properties properties = PropertiesUtils.asProperties(nlpProperties);
StanfordCoreNLP = coreNLP = new StanfordCoreNLP(properties);

String text = "John lives in London. \nHe is a doctor";
CoreDocument document = new CoreDocument(text);
coreNLP.annotate(document);
List<CoreLabel> tokens = document.tokens();

I'm getting the following 9 tokens: John, lives, in, London, ., He, is, a, doctor

According to the official documentation I would expect to get the new line character '\n'as a token by using tokenizeNLs option set to true but it doesn't happen. Any idea what I'm doing wrong?

0

There are 0 answers