What is the most fast and accurate POS Tagger in Python (with a commercial license)?

5.7k views Asked by At

Which POS tagger is fast and accurate and has a license that allows it to be used for commercial needs? For testing, I used Stanford POS which works well but it is slow and I have a license problem.

2

There are 2 answers

0
Laughing Horse On

I've had some successful experience with a combination of nltk's Part of Speech tagging and textblob's. Both are open for the public (or at least have a decent public version available).

http://textanalysisonline.com/nltk-pos-tagging

https://textblob.readthedocs.io/en/dev/

0
noɥʇʎԀʎzɐɹƆ On

You can use nltk.

>>> import nltk
>>> text = nltk.word_tokenize("And now for something completely different")
>>> nltk.pos_tag(text)
[('And', 'CC'), ('now', 'RB'), ('for', 'IN'), ('something', 'NN'),
('completely', 'RB'), ('different', 'JJ')]

Explanation:

word_tokenize first correctly tokenizes a sentence into words. Also available is a sentence tokenizer.

Then, pos_tag tags an array of words into the Parts of Speech.

More information available here and here.

See this answer for a long and detailed list of POS Taggers in Python.

NLTK is not perfect. In fact, no model is perfect.


You may need to first run

>>> import nltk; nltk.download()

in order to load the tokenizer data.