Segmenting sentence into subsentences with CoreNLP

Question

Segmenting sentence into subsentences with CoreNLP

489 views Asked by moritz At 05 November 2018 at 13:07

I am working on the following problem: I would like to split sentences into subsentences using Stanford CoreNLP. The example sentence could be:

"Richard is working with CoreNLP, but does not really understand what he is doing"

I would now like my sentence to be split into single "S" as shown in the tree diagram below:

I would like the output to be a list with the single "S" as follows:

['Richard is working with CoreNLP', ', but', 'does not really understand what', 'he is doing']

I would be really thankful for any help :)

Original Q&A

There are 2 answers

moritz On 06 November 2018 at 10:21

Ok, I found that one do this as follows:

import requests

url = "http://localhost:9000/tregex"
request_params = {"pattern": "S"}
text = "Pusheen and Smitha walked along the beach."
r = requests.post(url, data=text, params=request_params)
print r.json()

Does anybody know how to use other languages (I need German)?

**Gabor Angeli** · Accepted Answer · 2018-11-06T06:39:00+00:00

I suspect the tool you're looking for is Tregex, described in more detail in the power point here or the Javadoc of the class itself.

In your case, I believe the pattern you're looking for is simply S. So, something like:

tregex.sh “S” <path_to_file>

where the file is a Penn Treebank formatted tree -- that is, something like (ROOT (S (NP (NNS dogs)) (VP (VB chase) (NP (NNS cats))))).

As an aside: I believe the fragment ", but" is not actually a sentence, as you've hightlighted in the figure. Rather, the node you've highlighted subsumes the whole sentence "Richard is working with CoreNLP, but does not really understand what he is doing". Tregex would then print out this whole sentence as one of the matches. Similarly, "does not really understand what" is not a sentence unless it subsumes the entire SBAR: "does not understand what he is doing".

If you want just the "leaf" sentences (i.e., a sentence that's not subsumed by another sentence), you can try a pattern more like:

S !>> S

Note: I haven't tested the patterns -- use at your own risk!

TechQA.

Segmenting sentence into subsentences with CoreNLP

There are 2 answers

Related Questions in STANFORD-NLP

Related Questions in DEPENDENCY-PARSING

Related Questions in NLP

Related Questions in PYCORENLP

Popular Questions

Trending Questions