I am trying to do anaphora resolution and for that below is my code.
first i navigate to the folder where i have downloaded the stanford module. Then i run the command in command prompt to initialize stanford nlp module
java -mx4g -cp "*;stanford-corenlp-full-2017-06-09/*" edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 -timeout 15000
After that i execute below code in Python
from pycorenlp import StanfordCoreNLP
nlp = StanfordCoreNLP('http://localhost:9000')
I want to change the sentence Tom is a smart boy. He know a lot of thing.
into Tom is a smart boy. Tom know a lot of thing.
and there is no tutorial or any help available in Python.
All i am able to do is annotate by below code in Python
coreference resolution
output = nlp.annotate(sentence, properties={'annotators':'dcoref','outputFormat':'json','ner.useSUTime':'false'})
and by parsing for coref
coreferences = output['corefs']
i get below JSON
coreferences
{u'1': [{u'animacy': u'ANIMATE',
u'endIndex': 2,
u'gender': u'MALE',
u'headIndex': 1,
u'id': 1,
u'isRepresentativeMention': True,
u'number': u'SINGULAR',
u'position': [1, 1],
u'sentNum': 1,
u'startIndex': 1,
u'text': u'Tom',
u'type': u'PROPER'},
{u'animacy': u'ANIMATE',
u'endIndex': 6,
u'gender': u'MALE',
u'headIndex': 5,
u'id': 2,
u'isRepresentativeMention': False,
u'number': u'SINGULAR',
u'position': [1, 2],
u'sentNum': 1,
u'startIndex': 3,
u'text': u'a smart boy',
u'type': u'NOMINAL'},
{u'animacy': u'ANIMATE',
u'endIndex': 2,
u'gender': u'MALE',
u'headIndex': 1,
u'id': 3,
u'isRepresentativeMention': False,
u'number': u'SINGULAR',
u'position': [2, 1],
u'sentNum': 2,
u'startIndex': 1,
u'text': u'He',
u'type': u'PRONOMINAL'}],
u'4': [{u'animacy': u'INANIMATE',
u'endIndex': 7,
u'gender': u'NEUTRAL',
u'headIndex': 4,
u'id': 4,
u'isRepresentativeMention': True,
u'number': u'SINGULAR',
u'position': [2, 2],
u'sentNum': 2,
u'startIndex': 3,
u'text': u'a lot of thing',
u'type': u'NOMINAL'}]}
Any help on this?
Here is one possible solution that uses the data structure output by CoreNLP. All the information is provided. This is not intended as a full solution and extensions are probably required to deal with all situations, but this is a good starting point.
This gives the following output:
As you can see, this solution doesn't deal with correcting the case when a pronoun has a sentence-initial (title-case) antecedent ("The big cat" instead of "the big cat" in the last sentence). This depends on the category of the antecedent - common noun antecedents need lowercasing, while proper noun antecedents wouldn't. Some other ad hoc processing might be necessary (as for the possessives in my test sentence). It also presupposes that you will not want to reuse the original output tokens, as they are modified by this code. A way around this would be to make a copy of the original data structure or create a new attribute and change the
print_resolved
function accordingly. Correcting any resolution errors is also another challenge!