I am trying to plot Likert-type data, and am using code adapted from this site: https://scisley.github.io/articles/ggplot-likert-plot/
The code to reproduce the example:
# load packages
library(dplyr)
library(ggplot2)
library(knitr)
library(tidyr)
library(scales)
# Simulation N responses
N <- 50
answers <- c("Strongly Disagree","Somewhat Disagree","Neither Agree nor Disagree",
"Somewhat Agree", "Strongly Agree")
set.seed(12342)
d <- tibble(
id = paste0("Respondent", 1:N),
Q1 = sample(answers, N, replace=TRUE),
Q2 = sample(answers, N, replace=TRUE),
Q3 = sample(answers, N, replace=TRUE),
Q4 = sample(answers, N, replace=TRUE),
Q5 = sample(answers, N, replace=TRUE)
)
# reduce data
d.reduced <- d %>%
select(-id) %>%
gather("Q", "ans") %>%
group_by(Q, ans) %>%
summarize(n=n()) %>%
mutate(per = n/sum(n),
ans = factor(ans, levels=answers)) %>%
arrange(Q, ans)
# create plot data
stage1 <- d.reduced %>%
mutate(text = paste0(formatC(100 * per, format="f", digits=0), "%"),
cs = cumsum(per),
offset = sum(per[1:(floor(n()/2))]) + (n() %% 2)*0.5*(per[ceiling(n()/2)]),
xmax = -offset + cs,
xmin = xmax-per) %>%
ungroup()
# order plot data
gap <- 0.2
stage2 <- stage1 %>%
left_join(stage1 %>%
group_by(Q) %>%
summarize(max.xmax = max(xmax)) %>%
mutate(r = row_number(max.xmax)),
by = "Q") %>%
arrange(desc(r)) %>%
mutate(ymin = r - (1-gap)/2,
ymax = r + (1-gap)/2)
# create plot
ggplot(stage2) +
geom_rect(aes(xmin = xmin, xmax = xmax, ymin = ymin, ymax = ymax, fill=ans)) +
geom_text(aes(x=(xmin+xmax)/2, y=(ymin+ymax)/2, label=text), size = 3) +
scale_x_continuous("", labels=percent, breaks=seq(-1, 1, len=9), limits=c(-1, 1)) +
scale_y_continuous("", breaks = 1:n_distinct(stage2$Q),
labels=rev(stage2 %>% distinct(Q) %>% .$Q)) +
scale_fill_brewer("", palette = "BrBG")
The issue I have is that this requires all respondents to answer all questions, since it is assuming all rectangles will have the same length. If I have NA
s, like I have in d3
below, the bars do not center properly in the plot.
# create data.frame with NAs
d2 <- apply (d[2:ncol(d)], 2,
function(x) {x[sample( c(1:N), floor(N/10))] <- NA; x} )
d3 <- cbind(d[,1], d2)
How can the code be adjusted to create a plot for d3
like we have for d
above? I think I have isolated the issue to the offset
in stage1
but I can't see how to adjust it to account for NA
s.
If you want to remove NA values, do it at the summary stage before calculating percentages. To be honest, the code you have there seems a bit more complex than it needs to be. Here's a shorter version in a single pipeline that additionally handles NA values and puts the questions in the correct order: