Count verbs, nouns, and other parts of speech with python's NLTK

Question

Count verbs, nouns, and other parts of speech with python's NLTK

23.1k views Asked by Zach At 20 May 2012 at 15:41

I have multiple texts and I would like to create profiles of them based on their usage of various parts of speech, like nouns and verbs. Basially, I need to count how many times each part of speech is used.

I have tagged the text but am not sure how to go further:

tokens = nltk.word_tokenize(text.lower())
text = nltk.Text(tokens)
tags = nltk.pos_tag(text)

How can I save the counts for each part of speech into a variable?

Original Q&A

There are 1 answers

**dhg** · Accepted Answer · 2012-05-20T15:49:40+00:00

The pos_tag method gives you back a list of (token, tag) pairs:

tagged = [('the', 'DT'), ('dog', 'NN'), ('sees', 'VB'), ('the', 'DT'), ('cat', 'NN')]

If you are using Python 2.7 or later, then you can do it simply with:

>>> from collections import Counter
>>> counts = Counter(tag for word,tag in tagged)
>>> counts
Counter({'DT': 2, 'NN': 2, 'VB': 1})

To normalize the counts (giving you the proportion of each) do:

>>> total = sum(counts.values())
>>> dict((word, float(count)/total) for word,count in counts.items())
{'DT': 0.4, 'VB': 0.2, 'NN': 0.4}

Note that in older versions of Python, you'll have to implement Counter yourself:

>>> from collections import defaultdict
>>> counts = defaultdict(int)
>>> for word, tag in tagged:
...  counts[tag] += 1

>>> counts
defaultdict(<type 'int'>, {'DT': 2, 'VB': 1, 'NN': 2})

TechQA.

Count verbs, nouns, and other parts of speech with python's NLTK

There are 1 answers

Related Questions in PYTHON

Related Questions in NLP

Related Questions in TAGGING

Related Questions in NLTK

Related Questions in PART-OF-SPEECH

Popular Questions

Popular Tags

Trending Questions