Grouping & summarizing data frame by multiple different columns in R

1.2k views Asked by At

I don't know if I am not searching with the right terms but I can't find a post about this.

I have a df :

df <- data.frame(grouping_letter = c('A', 'A', 'B', 'B', 'C', 'C'), grouping_animal = c('Cat', 'Dog', 'Cat', 'Dog', 'Cat', 'Dog'), value = c(1,2,3,4,5,6))

I want to group by grouping_letter and by grouping_animal. I want to do this using dplyr.

If I did it separately, it would be :

df %>% group_by(grouping_letter) %>% summarise(sum(value))

df %>% group_by(grouping_animal) %>% summarise(sum(value))

Now let's say, I have hundreds of columns I need to group by individually. How can I do this?

I was trying:

results <- NULL for (i in grouping_columns) { results[[i]] <- df %>% group_by(df$i) %>% summarize(sum(value)) }

I got a list called results with the output. I am wondering if there is a better way to do this instead of using a for-loop?

1

There are 1 answers

3
akrun On

We can create an index of 'grouping' colums (using grep), loop over the index (with lapply) and separately get the sum of 'value' after grouping by the column in the 'index'.

 library(dplyr)
 i1 <- grep('grouping', names(df))
 lapply(i1, function(i) 
        df[setdiff(seq_along(df), i)] %>% 
                     group_by_(.dots=names(.)[1]) %>% 
                     summarise(Sumvalue= sum(value)))
#[[1]]
#Source: local data frame [2 x 2]

#  grouping_animal Sumvalue
#           (fctr)    (dbl)
#1             Cat        9
#2             Dog       12

#[[2]]
#Source: local data frame [3 x 2]

#  grouping_letter Sumvalue
#           (fctr)    (dbl)
#1               A        3
#2               B        7
#3               C       11

Or we can do this by converting the dataset from 'wide' to 'long' format, then group by the concerned columns and get the sum of 'value'.

library(tidyr)
gather(df, Var, Group, -value) %>% 
         group_by(Var, Group) %>% 
         summarise(Sumvalue = sum(value))
#              Var Group Sumvalue
#            (chr) (chr)    (dbl)
#1 grouping_animal   Cat        9
#2 grouping_animal   Dog       12
#3 grouping_letter     A        3
#4 grouping_letter     B        7
#5 grouping_letter     C       11