Detokenization for Stanford CoreNLP

322 views Asked by At

I have used stanford coreNLP's tokenizer to tokenize sentences into tokens. Now I need to detokenize the already tokenized words (i.e I need reverse tokenizer for standford coreNLP.) Is there any JAVA class in standfordcoreNLP or java/python API which we can use?

I/P:

I ca n't use this pen .
I have ( 5 ) points to explain .
I have discuss the 1,2,3 etc. ..

O/P: 

I can't use this pen.
I have (5) points to explain.
I have discuss the 1,2,3 etc... 
1

There are 1 answers

1
Manos Nikolaidis On

The Sentence class from the Simple API, has multiple constructors, one of which takes a List<String> argument.

So you can do something like:

List<String> words = new Sentence("I can't use this pen.").words();
Sentence output = new Sentence(words);