I am trying to make a word cloud from a list of phrases, many of which are repeated, instead of from individual words. My data looks something like this, with one column of my data frame being a list of phrases.
df$names <- c("John", "John", "Joseph A", "Mary A", "Mary A", "Paul H C", "Paul H C")
I would like to make a word cloud where all of these names are treated as individual phrases whose frequency is displayed, not the words which make them up. The code I have been using looks like:
df.corpus <- Corpus(DataframeSource(data.frame(df$names)))
df.corpus <- tm_map(client.corpus, function(x) removeWords(x, stopwords("english")))
#turning that corpus into a tDM
tdm <- TermDocumentMatrix(df.corpus)
m <- as.matrix(tdm)
v <- sort(rowSums(m),decreasing=TRUE)
d <- data.frame(word = names(v),freq=v)
pal <- brewer.pal(9, "BuGn")
pal <- pal[-(1:2)]
#making a worcloud
png("wordcloud.png", width=1280,height=800)
wordcloud(d$word,d$freq, scale=c(8,.3),min.freq=2,max.words=100, random.order=T, rot.per=.15, colors="black", vfont=c("sans serif","plain"))
dev.off()
This creates a word cloud, but it is of each component word, not of the phrases. So, I see the relative frequency of "A". "H", "John" etc instead of the relative frequency of "Joseph A", "Mary A", etc, which is what I want.
I'm sure this isn't that complicated to fix, but I can't figure it out! I would appreciate any help.
Your difficulty is that each element of
df$names
is being treated as "document" by the functions oftm
. For example, the documentJohn A
contains the wordsJohn
andA
. It sounds like you want to keep the names as is, and just count up their occurrence - you can just usetable
for that.