Entity Relation Extraction Stanford CoreNLP

1.8k views Asked by At

Please am trying to do a relationship extraction from a parse sentence bunch of text in a pdf. I used the stanford coreNLP with the the python pycorenlp to parse the sentence now I want to extract the Subject Verb and Object from this parse tree

Here is a sample of my Data : 'Mark Robert is the founder of 3trucks. 3trucks was founded in 2010'

Here is what I want as output: ('Mark Robert', founder,'3trucks') ('3truck', founded '2010')

Here is an example of the text and the code

import nltk
import re
from pycorenlp import *

nlp = StanfordCoreNLP("http://localhost:9000/")

text = 'Mark Robert is the founder of 3trucks. 3trucks was founded in 2010'

output = nlp.annotate(text, properties={
'annotators': 'tokenize,ssplit,pos,depparse,parse',
"timeout": "50000",
'outputFormat': 'json'

 })

print(output['sentences'][0]['parse'])
print('------------------------------')
print(output['sentences'][1]['parse'])`

my code output:

(ROOT
(S
(NP (NNP Mark) (NNP Robert))
(VP (VBZ is)
  (NP
    (NP (DT the) (NN founder))
    (PP (IN of)
      (NP (NNS 3trucks)))))
(. .)))
------------------------------
(ROOT
(S
(NP (NNS 3trucks))
(VP (VBD was)
  (VP (VBN founded)
    (PP (IN in)
      (NP (CD 2010)))))))
1

There are 1 answers

0
John Jonas On

You can include 'openie' in the list of annotators. Openie will also form groups of triplets, which is required as the list. Also remember to limit the output to 3.

output = nlp.annotate(s, properties={"annotators":"tokenize,ssplit,pos,depparse,natlog,openie",
                            "outputFormat": "json",
                             "openie.triple.strict":"true",
                             "openie.max_entailments_per_clause":"1"})

Post which you can add the output as per your need.

result = [output["sentences"][0]["openie"] for item in output]
for i in result:
    for rel in i:
        relationSent=rel['subject'],rel['relation'],rel['object']
        print(relationset)

Hope this helps.