doc = '''Andrew Yan-Tak Ng is a Chinese American computer scientist.He is the former chief scientist at Baidu, where he led the company's
Artificial Intelligence Group. He is an adjunct professor (formerly associate professor) at Stanford University. Ng is also the co-founder
and chairman at Coursera, an online education platform. Andrew was born in the UK on 27th Sep 2.30pm 1976. His parents were both from Hong Kong.'''
# tokenize doc
tokenized_doc = nltk.word_tokenize (doc)
# tag sentences and use nltk's Named Entity Chunker
tagged_sentences = nltk.pos_tag (tokenized_doc)
ne_chunked_sents = nltk.ne_chunk (tagged_sentences)
When you process and extract chucks..I see we only get [('Andrew', 'PERSON'), ('Chinese', 'GPE'), ('American', 'GPE'), ('Baidu', 'ORGANIZATION'), ("company's Artificial Intelligence Group", 'ORGANIZATION'), ('Stanford University', 'ORGANIZATION'), ('Coursera', 'ORGANIZATION'), ('Andrew', 'PERSON'), ('UK', 'ORGANIZATION'), ('Hong Kong', 'GPE')]
I need to get the time and date too? Please suggest... Thank you.
You need a more sophisticated tagger like the Stanford's Named Entity Tagger. Once you have it installed and configured, you can run it:
Where the output would be:
You will probably run into some issues when trying to install and set up everything, but I think it's worth the hassle.
Let me know if it helps.