Replace personal pronoun with previous person mentioned (noisy coref)

1.4k views Asked by At

I want to do a noisy resolution such that given a personal prounoun, that pronoun is replace by the previous(nearest) person.

For example:

Alex is looking at buying a U.K. startup for $1 billion. He is very confident that this is going to happen. Sussan is also in the same situation. However, she has lost hope.

the output is:

Alex is looking at buying a U.K. startup for $1 billion. Alex is very confident that this is going to happen. Sussan is also in the same situation. However, Susan has lost hope.

Another example,

Peter is a friend of Gates. But Gates does not like him.

In this case, the output would be :

Peter is a friend of Gates. But Gates does not like Gates.

Yes! This is super noisy.

Using spacy: I have extracted the Person using NER, but how can I replace pronouns appropriately?

Code:

import spacy
nlp = spacy.load("en_core_web_sm")
for ent in doc.ents:
  if ent.label_ == 'PERSON':
    print(ent.text, ent.label_)
2

There are 2 answers

0
thorntonc On BEST ANSWER

I have written a function that works for your two examples:

Consider using a larger model such as en_core_web_lg for more accurate tagging.

import spacy
from string import punctuation

nlp = spacy.load("en_core_web_lg")

def pronoun_coref(text):
    doc = nlp(text)
    pronouns = [(tok, tok.i) for tok in doc if (tok.tag_ == "PRP")]
    names = [(ent.text, ent[0].i) for ent in doc.ents if ent.label_ == 'PERSON']
    doc = [tok.text_with_ws for tok in doc]
    for p in pronouns:
        replace = max(filter(lambda x: x[1] < p[1], names),
                      key=lambda x: x[1], default=False)
        if replace:
            replace = replace[0]
            if doc[p[1] - 1] in punctuation:
                replace = ' ' + replace
            if doc[p[1] + 1] not in punctuation:
                replace = replace + ' '
            doc[p[1]] = replace
    doc = ''.join(doc)
    return doc
0
Sergey Bushmanov On

There is specially dedicated neuralcoref library to resolve coreference. See the minimal reproducible example below:

import spacy
import neuralcoref

nlp = spacy.load('en_core_web_sm')
neuralcoref.add_to_pipe(nlp)
doc = nlp(
'''Alex is looking at buying a U.K. startup for $1 billion. 
He is very confident that this is going to happen. 
Sussan is also in the same situation. 
However, she has lost hope.
Peter is a friend of Gates. But Gates does not like him.
          ''')

print(doc._.coref_resolved)

Alex is looking at buying a U.K. startup for $1 billion. 
Alex is very confident that this is going to happen. 
Sussan is also in the same situation. 
However, Sussan has lost hope.
Peter is a friend of Gates. But Gates does not like Peter.
 

Note, you may have some issues with neuralcoref if you pip install it, so it's better to build it from source, as I outlined it here