I have 2 dataframe's. one dataframe contain the column 1 i.e clusters having unique no : 550 and another column has tokens respective to each cluster. Now I have one more data frame which contains 1000 documents. each row contains one paragraph i.e document. Now how to get the token frequency w.r.t cluster comparing with the documents in another dataframe

    import nltk`enter code here`
    from nltk.tokenize import RegexpTokenizer
    tokenizer = RegexpTokenizer(r'\w+')

  def token_text(text):
    sent =[]
for each in tokenizer.tokenize(text):
    #if not each.isdigit():
return sent

def count_word(word,sent):
count =0
for item in sent:
    if item ==word:
        count =count+1
return count

def frequency_match(word,text):
sent = token_text(text)
count = count_word(word, sent)
return count

0 Answers