How to reduce empty space between variable-width boxplots in ggplot2?

51 views Asked by At

I'm trying to graph a set of variable-width boxplots, but I think that the spacing between the categories on the x-axis is set by the widest boxplot, meaning that there's a lot of extra space between the narrower boxplots.

#example data:
inputdata <- data.frame(
                        group=c('group1', 'group1', 'group1', 'group1', 'group1', 'group1', 'group1', 'group1', 'group1', 'group1', 'group1', 'group1', 'group1', 'group1', 'group1', 'group1', 'group1', 'group2', 'group2', 'group2', 'group2', 'group2', 'group2', 'group2', 'group2', 'group2', 'group2', 'group2', 'group2', 'group2', 'group2', 'group2', 'group2', 'group2', 'group2', 'group2', 'group2', 'group2', 'group2'),
                        gene_match=c('gene_b', 'gene_b', 'gene_b', 'gene_b', 'gene_b', 'gene_b', 'gene_b', 'gene_b', 'gene_b', 'gene_b', 'gene_c', 'gene_c', 'gene_d', 'gene_d', 'gene_e', 'gene_f', 'gene_f', 'gene_b', 'gene_b', 'gene_b', 'gene_b', 'gene_b', 'gene_b', 'gene_b', 'gene_b', 'gene_b', 'gene_b', 'gene_b', 'gene_b', 'gene_b', 'gene_b', 'gene_b', 'gene_b', 'gene_b', 'gene_d', 'gene_d', 'gene_f', 'gene_f', 'gene_f'),
                        evalue=c(1.3, 2.2, 6.8, 4.3, 15, 6.8, 12, 5.3, 14, 7.6, 9.7, 8, 6.8, 7.5, 5.6, 5.2, 9.2, 9.3, 0.4, 5.7, 11, 3.4, 10, 12, 4.5, 5.9, 1.3, 2.6, 9.8, 4.9, 9.4, 4.7, 9.7, 7.8, 5.1, 9.9, 3.3, 2, 5.7)
                        )

#variable-width boxplots:

ggplot(data=inputdata) +
       geom_boxplot(mapping=aes(x=reorder(gene_match, evalue, FUN=median),
                                y=evalue, 
                                fill=group, color=group), varwidth=TRUE) + 
       theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + 
       scale_fill_manual(name= "dataset", values = c("coral1", "turquoise2"))+
       scale_color_manual(name = "dataset", values = c("coral4", "turquoise4"))

Increasing the width of the window introduces a lot of blank space between the different x-axis categories plot with empty space

But reducing the window makes the narrower boxplots hard to see (looks worse with my real data, which has 11 categories on the x-axis) plot with boxplots too narrow

I want to keep the boxplots wide enough that the smaller ones are easy to see, but without introducing a ton of blank space. More like this photoshopped version: easier to read plot

I've tried to mess around with the weight aesthetic and a couple other things but haven't gotten anything to work

1

There are 1 answers

0
Carl On

Does something like this work with your real data?

It retains the varwidth, but uses facet_wrap and scales = "free_x":

library(tidyverse)

inputdata <- data.frame(
  group = c("group1", "group1", "group1", "group1", "group1", "group1", "group1", "group1", "group1", "group1", "group1", "group1", "group1", "group1", "group1", "group1", "group1", "group2", "group2", "group2", "group2", "group2", "group2", "group2", "group2", "group2", "group2", "group2", "group2", "group2", "group2", "group2", "group2", "group2", "group2", "group2", "group2", "group2", "group2"),
  gene_match = c("gene_b", "gene_b", "gene_b", "gene_b", "gene_b", "gene_b", "gene_b", "gene_b", "gene_b", "gene_b", "gene_c", "gene_c", "gene_d", "gene_d", "gene_e", "gene_f", "gene_f", "gene_b", "gene_b", "gene_b", "gene_b", "gene_b", "gene_b", "gene_b", "gene_b", "gene_b", "gene_b", "gene_b", "gene_b", "gene_b", "gene_b", "gene_b", "gene_b", "gene_b", "gene_d", "gene_d", "gene_f", "gene_f", "gene_f"),
  evalue = c(1.3, 2.2, 6.8, 4.3, 15, 6.8, 12, 5.3, 14, 7.6, 9.7, 8, 6.8, 7.5, 5.6, 5.2, 9.2, 9.3, 0.4, 5.7, 11, 3.4, 10, 12, 4.5, 5.9, 1.3, 2.6, 9.8, 4.9, 9.4, 4.7, 9.7, 7.8, 5.1, 9.9, 3.3, 2, 5.7)
)

inputdata |> 
  mutate(
    gene_match = str_replace(gene_match, "_", " "),
    gene_match = reorder(gene_match, evalue, median)
    ) |> 
  ggplot(aes(group, evalue, fill = group, color = group)) +
  geom_boxplot(varwidth = TRUE) +
  facet_wrap(~ gene_match, nrow = 1, 
             strip.position = "bottom", scales = "free_x") +
  scale_fill_manual(name = "dataset", values = c("coral1", "turquoise2")) +
  scale_color_manual(name = "dataset", values = c("coral4", "turquoise4")) +
  labs(x = NULL) +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    strip.text.x = element_text(angle = 90, hjust = 1),
    axis.text.x = element_blank(),
    axis.ticks.x = element_blank(),
    panel.border = element_blank()
  )

Created on 2024-02-27 with reprex v2.1.0