Writing a function in R to group by variable column from a data frame

Question

Writing a function in R to group by variable column from a data frame

3.5k views Asked by Sam At 09 December 2024 at 14:22

I am trying to write a function that will allow me to produce descriptive statistics by grouping across multiple factors in a data frame. I have spent way too many hours trying to get my function to recognize the by variables I am selecting.

Here is the fake data:

grouping1 <- c("red", "blue", "blue", "green", "red", "blue", "red", "green")                 
grouping2 <- c("high", "high", "low", "medium", "low", "high", "medium", "high")                  
value <- c(22,40,72,41,36,16,88,99)

fake_df <- data.frame(grouping1, grouping2, value)

Fake code example:

library(dplyr)

by_group_fun <- function(fun.data.in, fun.grouping.factor){
  fake_df2 <- fun.data.in %>%
    group_by(fun.grouping.factor) %>%
    summarize(mean = mean(value), median = median(value))
  fake_df2
}
by_group_fun(fake_df, grouping1) 
by_group_fun(fake_df, grouping2)

This gives me:

 Error in grouped_df_impl(data, unname(vars), drop) : 
  Column `fun.grouping.factor` is unknown

Second try

I tried to assign the by variable selected in the function to a new variable and carry that forward.

Fake code example (second try):

by_group_fun2 <- function(fun.data.in, fun.grouping.factor){
  fun.data.in$by_var <- fun.data.in$fun.grouping.factor

  fake_df2 <- fun.data.in %>%
    group_by(by_var) %>%
    summarize(mean = mean(value), median = median(value))
  fake_df2
}

by_group_fun2(fake_df, grouping1) 
by_group_fun2(fake_df, grouping2)

This, the second try, gives me:

 Error in grouped_df_impl(data, unname(vars), drop) : 
  Column `by_var` is unknown

Original Q&A

There are 2 answers

alistaire On 13 September 2017 at 01:46

A really simple way to get the same output without resorting to programming with dplyr is to gather the grouping columns to long form. Grouping by both the resulting key and value columns will get all the combinations you're asking for without moving beyond a single data.frame:

library(tidyverse)

fake_df <- data_frame(grouping1 = c("red", "blue", "blue", "green", "red", "blue", "red", "green"),
                      grouping2 = c("high", "high", "low", "medium", "low", "high", "medium", "high"),
                      value = c(22,40,72,41,36,16,88,99))

fake_df %>% 
    gather(group_var, group_val, -value) %>% 
    group_by(group_var, group_val) %>% 
    summarise(mean = mean(value), 
              median = median(value))
#> # A tibble: 6 x 4
#> # Groups:   group_var [?]
#>   group_var group_val     mean median
#>       <chr>     <chr>    <dbl>  <dbl>
#> 1 grouping1      blue 42.66667   40.0
#> 2 grouping1     green 70.00000   70.0
#> 3 grouping1       red 48.66667   36.0
#> 4 grouping2      high 44.25000   31.0
#> 5 grouping2       low 54.00000   54.0
#> 6 grouping2    medium 64.50000   64.5

**CPak** · Accepted Answer · 2017-09-13T01:20:57+00:00

Use this example to guide you

myfun <- function(df, thesecols) {
              require(dplyr)
              thesecols <- enquo(thesecols)    # need to quote
              df %>%
                group_by_at(vars(!!thesecols))  # !! unquotes
         }

myfun(fake_df, grouping1)

Output

# A tibble: 8 x 3
# Groups:   grouping1 [3]
  grouping1 grouping2 value
     <fctr>    <fctr> <dbl>
1       red      high    22
2      blue      high    40
3      blue       low    72
4     green    medium    41
5       red       low    36
6      blue      high    16
7       red    medium    88
8     green      high    99

TechQA.

Writing a function in R to group by variable column from a data frame

Second try

There are 2 answers

Related Questions in R

Related Questions in FUNCTION

Related Questions in DATAFRAME

Related Questions in DPLYR

Related Questions in RLANG

Popular Questions

Popular Tags

Trending Questions