Plotting Likert-type questions with some non-respondents

103 views Asked by At

I am trying to plot Likert-type data, and am using code adapted from this site: https://scisley.github.io/articles/ggplot-likert-plot/

The code to reproduce the example:

# load packages
library(dplyr)
library(ggplot2)
library(knitr)
library(tidyr)
library(scales)

# Simulation N responses
N <- 50
answers <- c("Strongly Disagree","Somewhat Disagree","Neither Agree nor Disagree",
              "Somewhat Agree", "Strongly Agree")
set.seed(12342)
d <- tibble(
  id = paste0("Respondent", 1:N),
  Q1 = sample(answers, N, replace=TRUE),
  Q2 = sample(answers, N, replace=TRUE),
  Q3 = sample(answers, N, replace=TRUE),
  Q4 = sample(answers, N, replace=TRUE),
  Q5 = sample(answers, N, replace=TRUE)
)

# reduce data
d.reduced <- d %>%
  select(-id) %>%
  gather("Q", "ans") %>%
  group_by(Q, ans) %>%
  summarize(n=n()) %>%
  mutate(per = n/sum(n),
         ans = factor(ans, levels=answers)) %>%
  arrange(Q, ans)

# create plot data
stage1 <- d.reduced %>%
  mutate(text = paste0(formatC(100 * per, format="f", digits=0), "%"),
         cs = cumsum(per),
         offset = sum(per[1:(floor(n()/2))]) + (n() %% 2)*0.5*(per[ceiling(n()/2)]),
         xmax = -offset + cs,
         xmin = xmax-per) %>%
  ungroup()

# order plot data
gap <- 0.2
stage2 <- stage1 %>%
  left_join(stage1 %>%
              group_by(Q) %>%
              summarize(max.xmax = max(xmax)) %>%
              mutate(r = row_number(max.xmax)),
            by = "Q") %>%
  arrange(desc(r)) %>%
  mutate(ymin = r - (1-gap)/2,
         ymax = r + (1-gap)/2)

# create plot
ggplot(stage2) +
  geom_rect(aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax, fill=ans)) +
  geom_text(aes(x=(xmin+xmax)/2, y=(ymin+ymax)/2, label=text), size = 3) +
  scale_x_continuous("", labels=percent, breaks=seq(-1, 1, len=9), limits=c(-1, 1)) +
  scale_y_continuous("", breaks = 1:n_distinct(stage2$Q),
                     labels=rev(stage2 %>% distinct(Q) %>% .$Q)) +
  scale_fill_brewer("", palette = "BrBG")

The issue I have is that this requires all respondents to answer all questions, since it is assuming all rectangles will have the same length. If I have NAs, like I have in d3 below, the bars do not center properly in the plot.

# create data.frame with NAs
d2 <- apply (d[2:ncol(d)], 2,
             function(x) {x[sample( c(1:N), floor(N/10))] <- NA; x} )
d3 <- cbind(d[,1], d2) 

How can the code be adjusted to create a plot for d3 like we have for d above? I think I have isolated the issue to the offset in stage1 but I can't see how to adjust it to account for NAs.

2

There are 2 answers

0
Allan Cameron On BEST ANSWER

If you want to remove NA values, do it at the summary stage before calculating percentages. To be honest, the code you have there seems a bit more complex than it needs to be. Here's a shorter version in a single pipeline that additionally handles NA values and puts the questions in the correct order:

d3 %>%
  pivot_longer(-id, names_to = "Q", values_to = "ans") %>%
  count(Q, ans) %>%
  filter(complete.cases(.)) %>%
  mutate(percent = n/sum(n), .by = Q) %>%
  mutate(Q = factor(Q, colnames(d)[-1]),
         ans = factor(ans, answers)) %>%
  arrange(Q, ans) %>%
  mutate(xmin = cumsum(lag(percent, 1, 0)), 
         xmax = cumsum(percent), .by = Q) %>%
  mutate(offset = (xmax[3] + xmax[2])/2, .by = Q) %>%
  mutate(across(starts_with("xm"), ~.x - offset)) %>%
  ggplot(aes(y = Q)) +
  geom_linerange(aes(xmin = xmin, xmax = xmax, color = ans), linewidth = 30,
                 key_glyph = draw_key_rect) +
  geom_text(aes((xmin + xmax)/2, label = percent(percent, 1)), size = 3) +
  scale_x_continuous(NULL, labels = percent, breaks = seq(-1, 1, len = 9), 
                     limits = c(-1, 1)) +
  scale_color_brewer(NULL, palette = "BrBG")

enter image description here

0
stefan On

Just as a reference: The recently published ggstats package offers a position_likert (and a gglikert() function and ...) which makes creating Likert charts a breeze. It also has an option exclude_fill_values to exclude values from being displayed but taken into account for computing proportions.

library(ggplot2)
library(ggstats)
library(tidyr)

d3 |>
  tidyr::pivot_longer(-id, names_to = "q", values_to = "ans") |>
  ggplot(aes(y = q, fill = ans)) +
  geom_bar(
    position = ggstats::position_likert(exclude_fill_values = NA)
  ) +
  geom_text(
    aes(label = after_stat(
      scales::percent(ave(count, y, FUN = \(x) x / sum(x)))
    )),
    position = ggstats::position_likert(
      vjust = .5, exclude_fill_values = NA
    ), stat = "count"
  ) +
  scale_x_continuous(labels = label_percent_abs()) +
  scale_fill_brewer("", palette = "BrBG")

enter image description here