I want to take this tagged text (formatted as such) and find the average frequency of the pos-tag DT in each sentence. ex. DT appears 1/3 words in sentence1 and 1/3 words in sentence2. Then I want to add these up and divide by the number of sentences in the text (2 in this case). This will give me the average appearance of DT per sentence.
from collections import Counter
import nltk
tagged_text = [('A', 'DT'), ('hairy', 'NNS'), ('dog', 'NN')]
[('The', 'DT'), ('mischevious', 'NNS'), ('elephant', 'NN')]
for eachSentence in tagged_text:
Counter(tag for word,tag in tagged)/len(eachsentence.split())
total = sum(counts.values())
float(average) = sum(counts.values())/len(tagged_text.sents())
print(float(average))
The big problem for me is the eachSentence part which I don't not how to get around (I don't know how to define what it is). I want this code to be able to be applied to hundreds of sentences that have the same format. I know there are a lot of problems with the code so if someone can please correct them I would be very grateful.
I'm (also) not really sure what you are after. Perhaps you should try to structure your idea/requirements a bit more (in your head/on paper) before trying to put it into code. Based on your description and code, I can think of two possible figures that you're after, which can be obtained in the following way:
Resulting in
All three tags appear in both sentences, hence likelihood of it appearing in a sentence is 1. All three tags both appear twice (on a total of six), hence a likelihood of 1/3 for them to appear (not regarding sentence distribution). But then again, not sure if this is what you're after.