Can I check the frequencies of predetermined words or phrases in document clustering using R?

293 views Asked by At

I'm doing a text mining using "tm" packages in R, and I can get word frequencies after I generate a term document matrix:

freq <- colSums(as.matrix(dtm))

ord <- order(freq)

freq[head(ord)]   
# abit   acal access accord across acsess     
#    1      1      1      1      1      1 

freq[tail(ord)]    
# direct   save  month   will  thank   list     
#    106    107    116    122    132    154 

It can only present me a list of word frequencies by sequence, I was wondering if I can check a word's frequency individually? Can I also check a phrase's frequency? For example, how many times the word "thank" is in a text corpus or what is the frequency of the phrase "contact number" shown in this corpus?

Many thanks for any hints and suggestions.

1

There are 1 answers

2
HOSS_JFL On

I show this by the data from the tm package:

library(tm)
data(crude)
dtm <- as.matrix(DocumentTermMatrix(crude))

#find the column that contains the word "demand"
columnindices <- which(colnames(dtm)=="demand")

#how often dooes the word "demand" show up?
sum(dtm[,columnindices])
>6

If you want to do this with phrases your dtm must contain these phrases not just the bag of single words as it is used in most cases. if this data is available, the procedure is the same as for a single word.