Extracting Predefined Specific Keywords from a Text and respective weightage

20 views Asked by At

Main Problem - I want to extract words from a text, I have used rake and nltk but the problem is it extracts all the keywords from the text but I only want it to extract legal keywords.

I have the database to the legal keywords containing around 10000 words and I want the keywords to be from this database.

The keyword can be 'Family Violence' etc, basically combination of the keywords provided from the database.

Context - I am making a legal advisory AI which extracts text from all the PDF provided by the user [Legal Documents] and then I want to extract all the legal terms from that data.

For Example - Dowry, Allegation, Family, Murder etc.

To get the context of the case and based on these keywords, AI will suggest articles and helpful resources.

This is it for the context.

-------EXAMPLE-------

Sample Text -

"The deceased, namely, Sudha was married to Balvir Singh. The
marriage of the deceased with Balvir Singh was solemnised on 12.12.1997 .
In the wedlock a son was born. On 02.06.2007, father of the deceased,
namely"

Expected Output - ["Deceased","Marriage","Solemnised","Wedlock","Son","Father","12.12.1997","02.06.2007"]

Real Outcome - ['balvir singh', 'balvir singh', 'wedlock', 'sudha', 'son', 'solemnised', 'namely', 'namely', 'married', 'marriage', 'father', 'deceased', 'deceased', 'deceased', 'born', '2007', '1997', '12', '12', '06', '02']

from rake_nltk import Rake
import nltk
r = Rake()
r.extract_keywords_from_text("The deceased, namely, Sudha was married to Balvir Singh. The marriage of the deceased with Balvir Singh was solemnised on 12.12.1997  In the wedlock a son was born. On 02.06.2007, father of the deceased, namely")
print(r.get_ranked_phrases())

0

There are 0 answers