Finding POS-TAG Frequency in sentences of a corpus

660 views Asked by At

Q. Finding pos-tag frequency/ sentence. Forgive me I am new to python started about 4 months ago. I was able to figure out how to apply pos-tags to the words in the document.

        train_text = SOME TEXT 1
        sample_text = SOME TEXT 2

        custom_sent_tokenizer = PunktSentenceTokenizer(train_text)

        tokenized = custom_sent_tokenizer.tokenize(sample_text)

def process_content():
    try:
       for i in tokenized:
          words = nltk.word_tokenize(i)
          tagged = nltk.pos_tag(words)
          print(tagged)

    except Exception as e:
      print(str(e))
 process_content()

Now here is where I have no idea to proceed. At this point I am interested in calculating the frequency of certain POS_TAGs per sentence. Then when I do that, I want to plot the sentence length (number of words) in relation to the number of the POS-TAGs I identified. When I tried to do this I was only able to find the frequency of the post-tags in the whole document in relation to all the words in the document. Even though I tokenized already I am still getting the whole document when I analyze. Help this is driving my nuts!

0

There are 0 answers