Q. Finding pos-tag frequency/ sentence. Forgive me I am new to python started about 4 months ago. I was able to figure out how to apply pos-tags to the words in the document.
train_text = SOME TEXT 1
sample_text = SOME TEXT 2
custom_sent_tokenizer = PunktSentenceTokenizer(train_text)
tokenized = custom_sent_tokenizer.tokenize(sample_text)
def process_content():
try:
for i in tokenized:
words = nltk.word_tokenize(i)
tagged = nltk.pos_tag(words)
print(tagged)
except Exception as e:
print(str(e))
process_content()
Now here is where I have no idea to proceed. At this point I am interested in calculating the frequency of certain POS_TAGs per sentence. Then when I do that, I want to plot the sentence length (number of words) in relation to the number of the POS-TAGs I identified. When I tried to do this I was only able to find the frequency of the post-tags in the whole document in relation to all the words in the document. Even though I tokenized already I am still getting the whole document when I analyze. Help this is driving my nuts!