Show missing data in a jittered plot

Question

Show missing data in a jittered plot

70 views Asked by uke At 30 November 2023 at 12:14

I have data from a questionnaire. It has 10 items, where one can score 0 or 1 in each item. Due to time pressure, many items, especially the last ones, were not answered, which is intended and counts as a score of 0. However, I want to preserve the NAs for visualizing them.

My goal is a plot that shows the raw data points jittered and some overlaid mean + error bars per item. The NA points should be plotted as well, at the side and in a different color, much like naniar::geom_miss_point() does. I have almost achieved this by overlaying geom_miss_point() and geom_jitter(). See the plots below.

example data creation

not important, just copy paste

library(ggplot2)
library(naniar)
set.seed(1)

# create weights for adding NAs later
# items have more NAs if their position is later

weights <- numeric()
for (i in 1:10) {
  weights <- c(weights, rep(i, i))
}

s <- seq(0, 590, by = 10)
na <- s + sample(weights, 
                 size = length(s),
                 replace = TRUE)

na2 <- s + sample(weights,
           size = length(s),
           replace = TRUE)

na3 <- unique(c(na, na2))

item <- rep(1:10, 60) |> as.factor()
score <- runif(600) |> round()
score[na3] <- NA
id <- rep(1:60, each = 10)
dat <- data.frame(id, item, score)

# compute a separate score where NA are counted as zero
dat$na_score <- dat$score
dat$na_score[is.na(dat$score)] <- 0

A) Perfect Jitter plot like I want it, but no `NA` shown

using geom_jitter()

ggplot(dat, aes(y = item, x = score)) +
  geom_jitter(height = 0.2, width = 0.05, alpha = 0.3) +
  stat_summary(fun.data = "mean_cl_normal",
               geom = "errorbar",
               aes(x = na_score)) +
  stat_summary(fun = mean,
               geom = "point",
               color = "red",
               aes(x = na_score)) +
  labs(subtitle = "mean score (red) and 95% CI error bars\nmean + error bars count NA as zero")

B) `NA`s shown, but jitter is only applied to `NA` and not normal data points :(

using geom_miss_point()

This is nice, because it shows how NAs are dragging down the mean score.

ggplot(dat, aes(y = item, x = score)) +
  geom_miss_point(alpha = 0.08) +
  stat_summary(fun.data = "mean_cl_normal",
               geom = "errorbar",
               aes(x = na_score)) +
  stat_summary(fun = mean,
               geom = "point",
               color = "red",
               size = 1,
               aes(x = na_score)) +
  scale_color_manual(values = c("blue","black")) +
  labs(subtitle = "mean score (red) and 95% CI error bars\nmean + error bars count NA as zero")

C) Cheating by overlaying A and B (Almost Desired Output)

This is close to my desired output, but ideally NA and normal points would have the same jitter, both horizontally and vertically. The main concern here is that normal data points are plotted twice, once as a jitter in geom_jitter() and once with geom_miss_point(). It would be easily hidden by tweaking alpha, but I exaggerated it to show the problem here.

ggplot(dat, aes(y = item, x = score)) +
  geom_miss_point(alpha = 0.08) +
  geom_jitter(height = 0.2, width = 0.05, alpha = 0.1) +
  stat_summary(fun.data = "mean_cl_normal",
               geom = "errorbar",
               aes(x = na_score)) +
  stat_summary(fun = mean,
               geom = "point",
               color = "red",
               size = 1,
               aes(x = na_score)) +
  scale_color_manual(values = c("blue","black")) +
  labs(subtitle = "mean score (red) and 95% CI error bars\nmean + error bars count NA as zero")

How can I achieve my desired output?

?geom_miss_point mentions the use of ggobi methods` to plot two things on the same axis. Maybe this is a way to go.

Original Q&A

There are 1 answers

**stefan** · Accepted Answer · 2023-11-30T12:53:02+00:00

Perhaps I miss something. But one option would be to plot the missing and non-missing data using just one geom_jitter without the need of geom_miss_point:

library(ggplot2)

dat$x <- dat$score
dat$x[is.na(dat$x)] <- -.1

ggplot(dat, aes(y = item, x = score)) +
  geom_jitter(
    aes(x = x, color = !is.na(score)),
    height = 0.2, width = 0.05, alpha = 0.1
  ) +
  stat_summary(
    fun.data = "mean_cl_normal",
    geom = "errorbar",
    aes(x = na_score)
  ) +
  stat_summary(
    fun = mean,
    geom = "point",
    color = "red",
    size = 1,
    aes(x = na_score)
  ) +
  scale_color_manual(
    name = "missing",
    values = c("blue", "black"),
    labels = c("Missing", "Not missing")
  ) +
  labs(subtitle = "mean score (red) and 95% CI error bars\nmean + error bars count NA as zero")

TechQA.

Show missing data in a jittered plot

example data creation

A) Perfect Jitter plot like I want it, but no `NA` shown

B) `NA`s shown, but jitter is only applied to `NA` and not normal data points :(

C) Cheating by overlaying A and B (Almost Desired Output)

There are 1 answers

Related Questions in R

Related Questions in GGPLOT2

Related Questions in NA

Related Questions in JITTER

Related Questions in NANIAR

Popular Questions

Popular Tags

Trending Questions

Show missing data in a jittered plot

example data creation

A) Perfect Jitter plot like I want it, but no NA shown

B) NAs shown, but jitter is only applied to NA and not normal data points :(

C) Cheating by overlaying A and B (Almost Desired Output)

There are 1 answers

Related Questions in R

Related Questions in GGPLOT2

Related Questions in NA

Related Questions in JITTER

Related Questions in NANIAR

Popular Questions

Popular Tags

Trending Questions

A) Perfect Jitter plot like I want it, but no `NA` shown

B) `NA`s shown, but jitter is only applied to `NA` and not normal data points :(