Create a new column with the sum of the values from some columns in a tibble on R

90 views Asked by At

I have a tibble like Table1, and I want to group some columns by the sum of their values for every row, creating a new column with the results and replacing the summed columns, like Table2.

Table1
| Col_A | INTER   | Col_B | Col_C | Col_D |
| ----- | ------- | ----- | ----- | ----- |
|   1   | [5-15)  |   2   |   4   |   2   |
|   1   | [15-25) |   1   |   3   |   6   |
|   1   | [25-35) |   1   |   1   |   2   |

In this example I'd sum values from Col_B and Col_D, saving the results in Col_E.

Table2  
| Col_A | INTER | Col_C | Col_E |
| ---   | ---   | ---   | ---   |
|   1   |   [5-15)   |   4   |  4    |
|   1   |   [15-25)   |   3   |  7    |
|   1   |   [25-35)   |   1   |  3    |

In my real data I have several columns (numeric variables) and they have NAs.

I tried:

Table2 <- Table1 %>% mutate(Col_E = rowSums(across(c(Col_B, Col_D))))

I got this error:

Error: Problem with mutate() column Col_E.
i Col_E = rowSums(...).
x Must subset columns with a valid subscript vector.
x Subscript has the wrong type logical. i It must be numeric or character. i The error occurred in group 1: Col_A = 1, INTER = "[5,15)".

2

There are 2 answers

0
maike On

Combining some of the input from comments before, adding a reprex with slightly modified data to address NA removal as well as examples for tidy column selection and how to pre-specify the name of the new column (useful, if you have a couple of column groups you want to aggregate and e.g. keep everything in a list to iterate over).

Here, I remove the columns that where aggregated in the last step using select(- ...), but you may also want to look into transmute(), if you are interested in keeping only the new column, but from what I read, this is not the case here.

require(dplyr)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df <- tribble(
~Col_A,  ~INTER,   ~ Col_B,~ Col_C,  ~Col_D ,
1     , '[5-15)'  , 2     , 4     , 2     ,
1     , '[15-25)' , 1     , 3     , NA     ,
1     , '[25-35)' , 1     , 1     , 2     
)


# pre-defined vector of column names
cols_add <- c('Col_B', 'Col_C',  'Col_D')
df %>% 
  mutate(!! col_sum := rowSums(across(all_of(cols_add)), na.rm = TRUE)) %>% 
  select(-all_of(cols_add))
#> # A tibble: 3 × 3
#>   Col_A INTER   Col_E
#>   <dbl> <chr>   <dbl>
#> 1     1 [5-15)      8
#> 2     1 [15-25)     4
#> 3     1 [25-35)     4

# you may also predefine the name of the result column, use !! and := in mutate 
col_sum  <- 'Col_E'

# using tidyselect with column index
df %>% 
  mutate(!! col_sum := rowSums(across(3:5), na.rm = TRUE)) %>% 
  select(-all_of(cols_add))
#> # A tibble: 3 × 3
#>   Col_A INTER   Col_E
#>   <dbl> <chr>   <dbl>
#> 1     1 [5-15)      8
#> 2     1 [15-25)     4
#> 3     1 [25-35)     4

# using tidyselect with regular expression
df %>% 
  mutate(!! col_sum := rowSums(across(matches('Col_[BCD]')), na.rm = TRUE)) %>% 
  select(-all_of(cols_add))
#> # A tibble: 3 × 3
#>   Col_A INTER   Col_E
#>   <dbl> <chr>   <dbl>
#> 1     1 [5-15)      8
#> 2     1 [15-25)     4
#> 3     1 [25-35)     4

# for multiple groups of columns to sum up
sum_spec <- list(
  col_E = c('Col_A', 'Col_B'),
  col_F = c('Col_C', 'Col_D')
)
 
df2 <- df 
iwalk(sum_spec, ~{
  df2 <<- df2 %>% 
    mutate( !! .y := rowSums(across(all_of(.x)), na.rm = TRUE)) %>% 
    select(-all_of(.x))
})

df2
#> # A tibble: 3 × 3
#>   INTER   col_E col_F
#>   <chr>   <dbl> <dbl>
#> 1 [5-15)      3     6
#> 2 [15-25)     2     3
#> 3 [25-35)     2     3

Created on 2023-11-14 with reprex v2.0.2

0
dq.canario On

I finally used the suggestion of creating a list of variables I want to sum, but then I just used the simple select() function, because I wanted to order my columns in a specific way (different from the Table1, not illustrated in the example data).

cols_add <- (Col_B, Col_D)

Table2 <- Table1 %>%
 mutate(Col_E = rowSums(across(all_of(cols_add)), na.rm = TRUE)) %>%
 select(Col_A, INTER, Col_C, Col_E)

But in the case I want the default order of the columns from Table1, I'd have used this:

cols_add <- (Col_B, Col_D)

Table2 <- Table1 %>%
 mutate(Col_E = rowSums(across(all_of(cols_add)), na.rm = TRUE)) %>%
 select(!cols_add)

Thank you all for your help