I am trying to write a function that uses dplyr::summarise
to obtain means of multiple columns of a data frame and assign dynamic names to the summarised columns using the new rlang
glue syntax and :=
operator.
Here's a simple example of my problem using the mtcars
dataset.
When summarising over just one column - the glue syntax works (i.e. the summarised column name is mean_mpg
):
mean_fun <- function(data, group_cols, summary_col) {
data %>%
group_by(across({{ group_cols }})) %>%
summarise("mean_{{ summary_col }}" := mean({{ summary_col }}, na.rm = T))
}
mean_fun(mtcars, c(cyl, gear), mpg)
cyl gear mean_mpg
<dbl> <dbl> <dbl>
1 4 3 21.5
2 4 4 26.9
3 4 5 28.2
4 6 3 19.8
5 6 4 19.8
6 6 5 19.7
7 8 3 15.0
8 8 5 15.4
But the equivalent does not name the cols properly when summarising over multiple columns:
mean_fun_multicols <- function(data, group_cols, summary_cols) {
data %>%
group_by(across({{ group_cols }})) %>%
summarise("mean_{{ summary_cols }}" := across({{ summary_cols }}, ~ mean(., na.rm = T)))
}
mean_fun_multicols(mtcars, c(cyl, gear), c(mpg, wt))
cyl gear `mean_c(mpg, wt)`$mpg $wt
<dbl> <dbl> <dbl> <dbl>
1 4 3 21.5 2.46
2 4 4 26.9 2.38
3 4 5 28.2 1.83
4 6 3 19.8 3.34
5 6 4 19.8 3.09
6 6 5 19.7 2.77
7 8 3 15.0 4.10
8 8 5 15.4 3.37
How can I get the summarised column names to read mean_mpg
and mean_wt
? And why does this not work?
I realise that there are likely many other ways to perform this task but I would like to know how to get this method (i.e. using tidy eval, rlang syntax in a bespoke function) to work for teaching purposes and my own understanding!
Thank you
We could use
.names
inacross
to rename-testing
NOTE: The
:=
is mainly used when there is a single column intidyverse
If we use the OP's function, we are assigning multiple columns to a single column and this returns a
tibble
instead of a normal column. We may need tounpack