I currently use spaCy to traverse the dependency tree, and generate entities.
nlp = get_spacy_model(detect_lang(unicode_text))
doc = nlp(unicode_text)
entities = set()
for sentence in doc.sents:
# traverse tree picking up entities
for token in sentence.subtree:
## pick entitites using some pre-defined rules
entities.discard('')
return entities
Are there any good Java alternatives for spaCy?
I am looking for libs which generate the Dependency Tree as is done by spaCy.
EDIT:
I looked into Stanford Parser. However, it generated the following parse tree:
ROOT
|
NP
_______________|_________
| NP
| _________|___
| | PP
| | ________|___
NP NP | NP
____|__________ | | _______|____
DT JJ JJ NN NNS IN DT JJ NN
| | | | | | | | |
the quick brown fox jumps over the lazy dog
However, I am looking for a tree structure like spaCy does:
jumps_VBZ
__________________________|___________________
| | | | | over_IN
| | | | | |
| | | | | dog_NN
| | | | | _______|_______
The_DT quick_JJ brown_JJ fox_NN ._. the_DT lazy_JJ
You're looking for the Stanford Dependency Parser. Like most of the Stanford tools, this is also bundled with Stanford CoreNLP under the
depparse
annotator. Other parsers include the Malt parser (a feature-based shift reduce parser) and Ryan McDonald's the MST parser (an accurate but slower maximum spanning tree parser).