Calculate mean of new variable for cells of contingency table

318 views Asked by At

I'm trying to summarize Likert scale ratings data with colored bubbles in a plot. I currently have a violin plot overlaid over a jittered, faceted scatterplot, which provides a near miss to what I am trying to communicate.

faceted scatterplot of jittered 7-point ratings shaded by continuous variable

Ideally, I would just have bubbles for each of the points on the Likert scale, sized by the number (or proportion) of items that had that score, and shaded by the mean value of the spKnownShown variable. Making a contingency table for the Likert-facet-x-axis combinations is trivial, but how do I link each cell to the mean of spKnownShown? Any recommendations for taking the next leap into an actual plot from the contingency table would be appreciated.

Apologies that I can't share the data, as it is under a confidentiality agreement.

1

There are 1 answers

2
Weihuang Wong On BEST ANSWER

Consider using functions from the dplyr package. I first make a fake dataset, where x, y, v, and f correspond to the x-axis, Likert, value for which you want the mean, and facet respectively.

library(ggplot2)
library(dplyr)
n <- 1000
set.seed(1)
d <- data.frame(x = sample(0:1, n, r = T),
                y = pmin(rpois(n, 2), 6),
                v = rnorm(n),
                f = sample(0:2, n, r = T))

Creating the values you want is a combination of using group_by and summarise from dplyr:

plt <- d %>% group_by(f, x, y) %>%
  summarise(n = n(), v = mean(v))

Finally, plot:

ggplot(plt, aes(x = factor(x), y = factor(y), size = n, colour = v)) +
  geom_point() +
  facet_wrap("f") 

enter image description here