Show missing data in a jittered plot

78 views Asked by At

I have data from a questionnaire. It has 10 items, where one can score 0 or 1 in each item. Due to time pressure, many items, especially the last ones, were not answered, which is intended and counts as a score of 0. However, I want to preserve the NAs for visualizing them.

My goal is a plot that shows the raw data points jittered and some overlaid mean + error bars per item. The NA points should be plotted as well, at the side and in a different color, much like naniar::geom_miss_point() does. I have almost achieved this by overlaying geom_miss_point() and geom_jitter(). See the plots below.

example data creation

not important, just copy paste

library(ggplot2)
library(naniar)
set.seed(1)

# create weights for adding NAs later
# items have more NAs if their position is later

weights <- numeric()
for (i in 1:10) {
  weights <- c(weights, rep(i, i))
}

s <- seq(0, 590, by = 10)
na <- s + sample(weights, 
                 size = length(s),
                 replace = TRUE)

na2 <- s + sample(weights,
           size = length(s),
           replace = TRUE)

na3 <- unique(c(na, na2))

item <- rep(1:10, 60) |> as.factor()
score <- runif(600) |> round()
score[na3] <- NA
id <- rep(1:60, each = 10)
dat <- data.frame(id, item, score)

# compute a separate score where NA are counted as zero
dat$na_score <- dat$score
dat$na_score[is.na(dat$score)] <- 0
A) Perfect Jitter plot like I want it, but no NA shown

using geom_jitter()

plotA

ggplot(dat, aes(y = item, x = score)) +
  geom_jitter(height = 0.2, width = 0.05, alpha = 0.3) +
  stat_summary(fun.data = "mean_cl_normal",
               geom = "errorbar",
               aes(x = na_score)) +
  stat_summary(fun = mean,
               geom = "point",
               color = "red",
               aes(x = na_score)) +
  labs(subtitle = "mean score (red) and 95% CI error bars\nmean + error bars count NA as zero")
B) NAs shown, but jitter is only applied to NA and not normal data points :(

using geom_miss_point()

This is nice, because it shows how NAs are dragging down the mean score.

plotB

ggplot(dat, aes(y = item, x = score)) +
  geom_miss_point(alpha = 0.08) +
  stat_summary(fun.data = "mean_cl_normal",
               geom = "errorbar",
               aes(x = na_score)) +
  stat_summary(fun = mean,
               geom = "point",
               color = "red",
               size = 1,
               aes(x = na_score)) +
  scale_color_manual(values = c("blue","black")) +
  labs(subtitle = "mean score (red) and 95% CI error bars\nmean + error bars count NA as zero")
C) Cheating by overlaying A and B (Almost Desired Output)

This is close to my desired output, but ideally NA and normal points would have the same jitter, both horizontally and vertically. The main concern here is that normal data points are plotted twice, once as a jitter in geom_jitter() and once with geom_miss_point(). It would be easily hidden by tweaking alpha, but I exaggerated it to show the problem here.

plotC

ggplot(dat, aes(y = item, x = score)) +
  geom_miss_point(alpha = 0.08) +
  geom_jitter(height = 0.2, width = 0.05, alpha = 0.1) +
  stat_summary(fun.data = "mean_cl_normal",
               geom = "errorbar",
               aes(x = na_score)) +
  stat_summary(fun = mean,
               geom = "point",
               color = "red",
               size = 1,
               aes(x = na_score)) +
  scale_color_manual(values = c("blue","black")) +
  labs(subtitle = "mean score (red) and 95% CI error bars\nmean + error bars count NA as zero")

How can I achieve my desired output?

?geom_miss_point mentions the use of ggobi methods` to plot two things on the same axis. Maybe this is a way to go.

1

There are 1 answers

3
stefan On BEST ANSWER

Perhaps I miss something. But one option would be to plot the missing and non-missing data using just one geom_jitter without the need of geom_miss_point:

library(ggplot2)

dat$x <- dat$score
dat$x[is.na(dat$x)] <- -.1

ggplot(dat, aes(y = item, x = score)) +
  geom_jitter(
    aes(x = x, color = !is.na(score)),
    height = 0.2, width = 0.05, alpha = 0.1
  ) +
  stat_summary(
    fun.data = "mean_cl_normal",
    geom = "errorbar",
    aes(x = na_score)
  ) +
  stat_summary(
    fun = mean,
    geom = "point",
    color = "red",
    size = 1,
    aes(x = na_score)
  ) +
  scale_color_manual(
    name = "missing",
    values = c("blue", "black"),
    labels = c("Missing", "Not missing")
  ) +
  labs(subtitle = "mean score (red) and 95% CI error bars\nmean + error bars count NA as zero")