I have a set of acknowledgements extracted from academic papers that contain sentences like the following:
I would like to thank PERSON1 for helping me with this paper.
We gratefully acknowledge PERSON2 for operating the equipment.
PERSON3 and PERSON4 are thanked for their guidance.
Thank you to PERSON5, who set up the experiment.
PERSON6 analysed the data, and for this we are thankful.
I used Named Entity Recognition to parse out the person names, and now am trying to find some way to capture what they did. Ideally I'd like to end up with a dataset like this:
Person | Contribution |
---|---|
PERSON1 | helping me with this paper |
PERSON2 | operating the equipment |
PERSON3 | their guidance |
PERSON4 | their guidance |
PERSON5 | set up the experiment |
PERSON6 | analysed the data |
Is there any way to capture this information using Spacy (or another Python tool)? The result doesn't have to be perfect: I don't mind if I sometimes capture extra information or miss information, as long as I catch most cases.
A couple of notes:
In real life, the sentences can be much more complicated, eg. "Thanks to PERSON1 for X and PERSON2 for Y and...". The contributions can also be longer like "thank you to PERSON3 for kindly providing the manuscript which is described in detail below, and for being good friend, and for always having my back."
I don't need to specifically check for words like "thank", "acknowledge" - I just want to catch the action that belongs to each person (understanding that I might also catch cases that aren't contributions).