Which POS tagger is fast and accurate and has a license that allows it to be used for commercial needs? For testing, I used Stanford POS which works well but it is slow and I have a license problem.
What is the most fast and accurate POS Tagger in Python (with a commercial license)?
5.7k views Asked by Regina At
2
There are 2 answers
0
On
You can use nltk.
>>> import nltk
>>> text = nltk.word_tokenize("And now for something completely different")
>>> nltk.pos_tag(text)
[('And', 'CC'), ('now', 'RB'), ('for', 'IN'), ('something', 'NN'),
('completely', 'RB'), ('different', 'JJ')]
Explanation:
word_tokenize
first correctly tokenizes a sentence into words. Also available is a sentence tokenizer.
Then, pos_tag
tags an array of words into the Parts of Speech.
More information available here and here.
See this answer for a long and detailed list of POS Taggers in Python.
NLTK is not perfect. In fact, no model is perfect.
You may need to first run
>>> import nltk; nltk.download()
in order to load the tokenizer data.
I've had some successful experience with a combination of nltk's Part of Speech tagging and textblob's. Both are open for the public (or at least have a decent public version available).
http://textanalysisonline.com/nltk-pos-tagging
https://textblob.readthedocs.io/en/dev/