Comparison word cloud query

772 views Asked by At

I used the comparison.cloud function in wordcloud package in R. The word 'good' appeared in both Cat1 (27 times) and Cat2(33 times) categories however on the wordcloud it appears only under Cat1 for some reasons (probably because it is the first column).

Can you suggest how can it be tweaked to get all the words (even though if it is same words in multiple categories). This is a significant finding for my dataset and it defeats the purpose of comparing the cloud when it deletes the most important word from Cat2.

Data looks like this matrix:

       Cat1     Cat2     Cat3

good --- 27 -------- 33 --------- 3

bad --- 10 ----------- 6 --------- 4

...

Code is:

comparison word cloud

tdm= read.table("doc.csv")

png("comparision_wordcloud.png", width=1280,height=800)

comparison.cloud(tdm, colors = brewer.pal(nemo, "Dark2"), use.r.layout=FALSE,

             scale = c(4,.5), max.words = 1000, rot.per=.1, random.order = FALSE, title.size = 2)

Let me know if this has a quick fix.

1

There are 1 answers

0
Richard On

As I understand it, a comparison cloud calculates into which category each term is most likely to occur (i.e. the term 'good' is most likely to occur in category 1). So each term will only occur once in a comparison cloud.

I haven't looked at the code behind the function but I imagine that it starts by calculating the average occurrence of the term 'good' in all categories combined and then for each category in turn. By calculating the difference between the average of the term in each category and the overall average you can work out into which category the term should appear in the cloud (i.e. the largest positive difference between the overall average and category average).

Consequently in your example: even though 'good' appears more often in cat 2, there are probably fewer overall terms in cat1 so it will actually be more important in cat1. The bottom line is that I don't think that the comparison cloud will do what you want.