Summarise multiple functions at once using tidyeval in dplyr 1.0

107 views Asked by At

Say we have a data frame,

library(tidyverse)
library(rlang)

df <- tibble(id = rep(c(1:2), 10),
             grade = sample(c("A", "B", "C"), 20, replace = TRUE))

we would like to get the mean of grades grouped by id,

df %>% 
    group_by(id) %>% 
    summarise(
        n = n(),
        mu_A = mean(grade == "A"),
        mu_B = mean(grade == "B"),
        mu_C = mean(grade == "C")
    )

I am handling a case where there are multiple conditions (many grades in this case) and would like to make my code more robust. How can we simplify this using tidyevaluation in dplyr 1.0?

I am talking about the idea of generating multiple column names by passing all grades at once, without breaking the flow of piping in dplyr, something like

# how to get the mean of A, B, C all at once?
mu_{grade} := mean(grade == {grade})
1

There are 1 answers

0
ThomasJc On

I actually found the answer to my own question from a post that I wrote 2 years ago...

I am just going to post the code right below hoping to help anybody that comes across the same problem.

make_expr <- function(x) {
    x %>%
        map( ~ parse_expr(str_glue("mean(grade == '{.x}')")))
}

# generate multiple expressions
grades <- c("A", "B", "C")
exprs  <- grades %>% make_expr() %>% set_names(paste0("mu_", grades))

# we can 'top up' something extra by adding named element 
exprs <- c(n = parse_expr("n()"), exprs) 

# using the big bang operator `!!!` to force expressions in data frame
df %>% group_by(id) %>% summarise(!!!exprs)