How to perform a R untidy sum?

79 views Asked by At

Update

I have untidy data that looks like the example df2.

df2 = data.frame(label = c("C360", "C360~C183", "C157~C360", "C183", "C157", "C157~C183", "C195~C183"), 
                 values = c(10, 11, 12, 13, 14, 15, 16))
> df2
      label values
1      C360     10
2 C360~C183     11
3 C157~C360     12
4      C183     13
5      C157     14
6 C157~C183     15
7 C195~C183     16

I need to sum partial matches that include the letters in different positions, so that I will the following desired output:

  string sum
1   C360  33
2   C183  55
3   C157  41
4   C195  16

Old

df1 = data.frame(label = c("a", "a.1", "2.a", "b", "c"), 
                 values = c(10, 11, 12, 13, 14))
> df1
  label values
1     a     10
2   a.1     11
3   2.a     12
4     b     13
5     c     14

Expected output:

   label sum
1      a  33
2      b  13
3      c  14
3

There are 3 answers

2
Friede On BEST ANSWER

New

stack(setNames(strsplit(df2$label, "~"), df2$values)) |>
  type.convert(as.is = TRUE) |>
  `colnames<-`(names(df2)) |>
  aggregate(values ~ label, sum)

  label values
1  C157     41
2  C183     55
3  C195     16
4  C360     33

Data:

df2 = data.frame(label = c("C360", "C360~C183", "C157~C360", "C183", "C157", "C157~C183", "C195~C183"), 
                 values = c(10, 11, 12, 13, 14, 15, 16))

Old

For df1, you might consider

aggregate(df1$values, list(letter = gsub("[^a-z]", "", df1$label)), sum)

  letter  x
1      a 33
2      b 13
3      c 14

where the appropriate regex (something more concrete than "[^a-z]") depends on your letter (labels) column.

0
zephryl On

For your updated example, you can use tidyr::separate_rows():

library(tidyr)
library(dplyr)

df2 %>%
  separate_rows(label, sep = "~") %>%
  group_by(string = label) %>%
  summarize(sum = sum(values))
# # A tibble: 4 × 2
#   string   sum
#   <chr>  <dbl>
# 1 C157      41
# 2 C183      55
# 3 C195      16
# 4 C360      33

For your original example, you can use stringr::str_extract() to extract and group by letters:

library(dplyr)
library(stringr)

df1 %>%
  group_by(letter = str_extract(label, "[a-z]")) %>%
  summarize(sum = sum(values))
# # A tibble: 3 × 2
#   letter   sum
#   <chr>  <dbl>
# 1 a         33
# 2 b         13
# 3 c         14
2
Adriano Mello On

How about:

df2 <- df1 %>%
  mutate(letter = str_remove_all(label, "[^[:alpha:]]")) %>% 
  summarise(.by = letter, sum = sum(values))

# ----------
> df2
  letter sum
1      a  33
2      b  13
3      c  14