Search for job titles in an article using Spacy or NLTK

5.9k views Asked by At

I'm new to NLP and recently been playing with NTLK and Spacy. However, I could not find a way to search for job titles (ex: product manager, chief marketing officer, etc) in an article.

Example, I have 1000 articles and I want to get all the articles that have job titles that I am interested in.

Also, what entity type does job titles fall in? I check https://spacy.io/docs/usage/entity-recognition and did not see it in there. I there a plan to add it?

Thanks.

2

There are 2 answers

0
joel On

"Job Titles" entity is not supported by Spacy NER, as also stated by Nathan. But you can create a custom named entity for your use case. Here is official documentation link. You can find step by step guide to train Spacy NER there.

You would need labeled data to train your NER. Generally you would need atleast 4000-5000 examples for train and 2000 examples for test. The more training data you have, the better will be the NER performance.

Here is some sample training data.

TRAIN_DATA = [
    ('Who is Shaka Khan?', {
        'entities': [(7, 17, 'PERSON')]
    }),
    ('I like London and Berlin.', {
        'entities': [(7, 13, 'LOC'), (18, 24, 'LOC')]
    }),
    ('I work as software engineer.', {
        'entities': [(9, 18, 'JOBTITLE')]
    }),

]
0
Dekel On

Stanford NER supports Titles (not perfect though). See demo page at http://corenlp.run/