ggplot2 add sum to chart

2.5k views Asked by At

Using mtcars as an example, I've produced some violin plots. I wanted to add two things to this chart:

  1. for each group, list n
  2. for each group, sum a third variable (e.g. wt)

I can do (1) with the geom_text code below although (n) is actually plotted on the x axis rather than off to the side.

But I can't work out how to do (2).

Any help much appreciated!

library(ggplot2)
library(gridExtra)
library(ggthemes)

result <- mtcars

ggplot(result, aes(x = gear, y = drat, , group=gear)) +
  theme_tufte(base_size = 15) + theme(line=element_blank()) +
  geom_violin(fill = "white") +
  geom_boxplot(fill = "black", alpha = 0.3, width = 0.1) +
  ylab("drat") + 
  xlab("gear") +
  coord_flip()+
  geom_text(stat = "count", aes(label = ..count.., y = ..count..)) 
2

There are 2 answers

0
Miksmith On

Thanks to those who helped.... I used this in the end which plots the calculated values, one set of classes being text based so using vjust to position the vertical offset.

thanks again!

library(ggplot2)
library(gridExtra)
library(ggthemes)

results <- mtcars
results$gear <- as.factor(as.character(results$gear)) #Turn 'gear' to text to simulate classes, then factorise

result_sum <- results %>%
  group_by(gear) %>%
  summarise(count = n(), sum_wt = sum(wt))

ggplot(results, aes(x = gear, y = drat, group=gear)) +
  theme_tufte(base_size = 15) + theme(line=element_blank()) +
  geom_violin(fill = "white") +
  geom_boxplot(fill = "black", alpha = 0.3, width = 0.1) +
  ylab("drat") + 
  xlab("gear") +
  coord_flip()+
  geom_text(data = result_sum, aes(label = paste0("n = ", count), x = (gear), vjust= 0, y = 5.25)) +
  geom_text(data = result_sum, aes(label = paste0("sum wt = ", round(sum_wt,0)), x = (gear), vjust= -2, y = 5.25))
6
tbradley On

You can add both of these annotations by creating them in your dataframe temporarily prior to graphing. Using the dplyr package, you can create two new columns, one with the count for each group, and one with the sum of wt for each group. This can then be piped directly into your ggplot using %>% (alternatively, you could save the new dataset and insert it into ggplot the way you have it). Then with some minor edits to your geom_text call and adding a second one, we can create the plot you want. The code looks like this:

library(ggplot2)
library(gridExtra)
library(ggthemes)
library(magrittr)
library(dplyr)

result <- mtcars

result %>%
  group_by(gear) %>%
  mutate(count = n(), sum_wt = sum(wt)) %>%
  ggplot(aes(x = gear, y = drat, , group=gear)) +
    theme_tufte(base_size = 15) + theme(line=element_blank()) +
    geom_violin(fill = "white") +
    geom_boxplot(fill = "black", alpha = 0.3, width = 0.1) +
    ylab("drat") + 
    xlab("gear") +
    coord_flip()+
    geom_text(aes(label = paste0("n = ", count), 
                  x = (gear + 0.25), 
                  y = 4.75)) +
    geom_text(aes(label = paste0("sum wt = ", sum_wt), 
                  x = (gear - 0.25),
                  y = 4.75)) 

The new graph looks like this:

Alternatively, if you create a summary data frame named result_sum, then you can manually add that into the geom_text calls.

result <- mtcars %>%
  mutate(gear = factor(as.character(gear)))

result_sum <- result %>%
  group_by(gear) %>%
  summarise(count = n(), sum_wt = sum(wt))


ggplot(result, aes(x = gear, y = drat, , group=gear)) +
  theme_tufte(base_size = 15) + 
  theme(line=element_blank()) +
  geom_violin(fill = "white") +
  geom_boxplot(fill = "black", alpha = 0.3, width = 0.1) +
  ylab("drat") + 
  xlab("gear") +
  coord_flip()+
  geom_text(data = result_sum, aes(label = paste0("n = ", count), 
                                   x = (as.numeric(gear) + 0.25), 
                                   y = 4.75)) +
  geom_text(data = result_sum, aes(label = paste0("sum wt = ", sum_wt), 
                                   x = (as.numeric(gear) - 0.25),
                                   y = 4.75))

This gives you this:

The benefit to this second method is that the text isn't bold like in the first graph. The bold effect occurs in the first graph due to the text being printed over itself for all observations in the dataframe.