I don't know if I am not searching with the right terms but I can't find a post about this.
I have a df :
df <- data.frame(grouping_letter = c('A', 'A', 'B', 'B', 'C', 'C'), grouping_animal = c('Cat', 'Dog', 'Cat', 'Dog', 'Cat', 'Dog'), value = c(1,2,3,4,5,6))
I want to group by grouping_letter
and by grouping_animal
. I want to do this using dplyr
.
If I did it separately, it would be :
df %>% group_by(grouping_letter) %>% summarise(sum(value))
df %>% group_by(grouping_animal) %>% summarise(sum(value))
Now let's say, I have hundreds of columns I need to group by individually. How can I do this?
I was trying:
results <- NULL
for (i in grouping_columns) {
results[[i]] <- df %>% group_by(df$i) %>% summarize(sum(value))
}
I got a list called results with the output. I am wondering if there is a better way to do this instead of using a for-loop?
We can create an index of 'grouping' colums (using
grep
), loop over the index (withlapply
) and separately get thesum
of 'value' after grouping by the column in the 'index'.Or we can do this by converting the dataset from 'wide' to 'long' format, then group by the concerned columns and get the
sum
of 'value'.