Combine (summarize) certain rows based on a category

76 views Asked by At

I'm working with a simple dataframe in R, similar to this one:

data <- data.frame(
  "food"= c("Banana", "Orange", "Apple", "Meat", "Fish", "Cherries", "Wheat"),
  "kg"= c(2,3,1,2,6,4,5)
)

My point is that I would like to create a new row, called "Fruits" and having the kg value of "Banana", "Orange", "Apple" and "Cherries" combined (I would then delete those and just keep "Fruits").

The closer I've come to a solution has been this attempt:

library(tidyverse)
data <- data %>% 
  add_row(.data = data.frame(food="Fruits",
                             total=sum(data$kg[c=1,2,3,6])))

# Error message: 'incorrect number of dimensions'

I am quite new at R, so I don't know how to create a value that stores the addition of the rows I wanted and include it in a new row.

2

There are 2 answers

3
M-- On BEST ANSWER

Let's say we have 2 categories (Fruits and Protein), while "Wheat" is not in either groups, hence, won't be "summarized" (if you only want "Fruits" to be summarized, you simply drop the Protein from categories list).

library(tibble)
library(tidyr)
library(dplyr)

categories <- list(Fruits = c("Banana", "Orange", "Apple", "Cherries"), 
                   Protein = c("Meat", "Fish"))

enframe(categories) %>% 
  unnest(value) %>% 
  full_join(data, .,  join_by(food == value)) %>% 
  mutate(name = coalesce(name, food)) %>% 
  summarise(food = first(name), kg = sum(kg, na.rm = T), .by = name) %>% 
  select(-name)

#>      food kg
#> 1  Fruits 10
#> 2 Protein  8
#> 3   Wheat  5

Created on 2024-03-12 with reprex v2.0.2

2
EJump On

Are you looking for something like this?

library(tidyverse)

data <- data.frame(
  "food"= c("Banana", "Orange", "Apple", "Meat", "Fish", "Cherries", "Wheat"),
  "kg"= c(2,3,1,2,6,4,5)
)

data %>% 
   mutate(type = case_when(food %in% c("Banana", "Orange", "Apple", "Cherries") ~ "Fruits",
                           TRUE ~ "Other")) %>%
   group_by(type) %>%
   summarize(total_kg = sum(kg),
            .groups = "keep") %>%
   ungroup()

# A tibble: 2 × 2
#  type   total_kg
#  <chr>     <dbl>
#1 Fruits       10
#2 Other        13

Here you create an additional variable called type that specifies the type of food. You use the type variable to group your data and then you can sum the kg column by group. summarize() collapses the dataset into one row per value in the type variable.

If you want to keep the food variable, you can use this mutate() instead of summarize(). To only keep the total kgs by group, use the second code chunk here (the one with distinct()):

data %>% 
   mutate(type = case_when(food %in% c("Banana", "Orange", "Apple", "Cherries") ~ "Fruits",
                              TRUE ~ "Other")) %>%
   group_by(type) %>%
   mutate(total_kg = sum(kg)) %>%
   ungroup() 

# A tibble: 7 × 4
#  food        kg type   total_kg
#  <chr>    <dbl> <chr>     <dbl>
#1 Banana       2 Fruits       10
#2 Orange       3 Fruits       10
#3 Apple        1 Fruits       10
#4 Meat         2 Other        13
#5 Fish         6 Other        13
#6 Cherries     4 Fruits       10
#7 Wheat        5 Other        13

data1 %>%
   filter(type == "Fruits") %>%
   distinct(type, total_kg)  

# A tibble: 1 × 2
#  type   total_kg
#  <chr>     <dbl>
#1 Fruits       10