Information Extraction and Relation Extraction with Stanford NLP for python

2.7k views Asked by At

How do I do extract the names of some companies from a bunch of documents using the Stanford core NLP for Python?

Here is a sample of my Data :

‘3Trucks Inc (‘3Trucks’ or the Company) is a tech-enabled long-haul B2B digital platform matching cargo owners with long-haul freight needs and truck owners who can service them, through its internally-developed digital platform.founded in 2016, 3Trucks is headquartered in California and has leased offices in Boston and Florida. Some of their top clients are, Google,IBM and Nokia

3Trucks was founded in 2010, with Mr. Mark Robert as its CEO and John Mclean as a Partner and CTO.'

I want to output for Information extraction:

3Truck

I want to output for Relation extraction:

('3truck', founded '2010'),
('John Mclean', 'Partner')
('3truck',client 'Google')
2

There are 2 answers

0
Naga kiran On

normally Named entity recognition will be used for such applications, but NER can only classify into some categories.

from nltk import word_tokenize, pos_tag, ne_chunk
from nltk.chunk import tree2conlltags

sentence = "Mark and John are working at Google."
print(tree2conlltags(ne_chunk(pos_tag(word_tokenize(sentence))
"""[('Mark', 'NNP', 'B-PERSON'), 
    ('and', 'CC', 'O'), ('John', 'NNP', 'B-PERSON'), 
    ('are', 'VBP', 'O'), ('working', 'VBG', 'O'), 
    ('at', 'IN', 'O'), ('Google', 'NNP', 'B-ORGANIZATION'), 
    ('.', '.', 'O')] """

For your application you have to train the Named entity recognition with respect to data , you are going to ask Training NER

0
Shahul Es On

It is pretty simple, You can use Spacy NER ( Natural language Entity recognition) to do your task.It has a set of pretrainded models to idenitfy different entities.