I have multiple texts and I would like to create profiles of them based on their usage of various parts of speech, like nouns and verbs. Basially, I need to count how many times each part of speech is used.
I have tagged the text but am not sure how to go further:
tokens = nltk.word_tokenize(text.lower())
text = nltk.Text(tokens)
tags = nltk.pos_tag(text)
How can I save the counts for each part of speech into a variable?
The
pos_tag
method gives you back a list of (token, tag) pairs:If you are using Python 2.7 or later, then you can do it simply with:
To normalize the counts (giving you the proportion of each) do:
Note that in older versions of Python, you'll have to implement
Counter
yourself: