I am working on Named entities and their attribute extraction. Where my objective is to extract attributes associated with a particular entity in the sentence.
For example - "The Patient report is Positive for ABC disease"
In above sentence, ABC is a Entity and Positive is a Attribute defining ABC.
I am looking for an concise approach to extract the attributes, I already formulated a solution to extract entities which is working seamlessly with respectable accuracy and now working on second part of the problem statement to extract its associated attributes.
I tried extracting attributes with rule based approach which providing descent result but having following cons:
- Source code is unmanageable.
- Its not at all generic and difficult to manage new scenarios.
- Time consuming.
To portray a more generic solution I explored different NLP techniques and found Dependency Tree Parsing as a potential solution.
Looking for suggestion/inputs on how to solve this problem using dependency tree parsing using Python/Java.
Feel free to suggest any other technique which could potentially help here.
I suggest to use the
spacy
python library because it is easy to use and has a decent dependency parser.A baseline solution would traverse the dependency tree in a breadth-first fashion starting from your entity of interest, until it encounters a token that looks like an attribute or until it walks too far from the entity.
Further improvements to this solution would include:
Here is my baseline code: