I am trying to make a word cloud from a list of phrases, many of which are repeated, instead of from individual words. My data looks something like this, with one column of my data frame being a list of phrases.
df$names <- c("John", "John", "Joseph A", "Mary A", "Mary A", "Paul H C", "Paul H C")
I would like to make a word cloud where all of these names are treated as individual phrases whose frequency is displayed, not the words which make them up. The code I have been using looks like:
df.corpus <- Corpus(DataframeSource(data.frame(df$names)))
df.corpus <- tm_map(client.corpus, function(x) removeWords(x, stopwords("english")))
#turning that corpus into a tDM
tdm <- TermDocumentMatrix(df.corpus)
m <- as.matrix(tdm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
pal <- brewer.pal(9, "BuGn")
pal <- pal[-(1:2)]
#making a worcloud
png("wordcloud.png", width=1280,height=800)
wordcloud(d$word,d$freq, scale=c(8,.3),min.freq=2,max.words=100, random.order=T, rot.per=.15, colors="black", vfont=c("sans serif","plain"))
dev.off()
This creates a word cloud, but it is of each component word, not of the phrases. So, I see the relative frequency of "A". "H", "John" etc instead of the relative frequency of "Joseph A", "Mary A", etc, which is what I want.
I'm sure this isn't that complicated to fix, but I can't figure it out! I would appreciate any help.
Your difficulty is that each element of
df$namesis being treated as "document" by the functions oftm. For example, the documentJohn Acontains the wordsJohnandA. It sounds like you want to keep the names as is, and just count up their occurrence - you can just usetablefor that.