Aggregate column data by date in r

28 views Asked by At

From the given dataframe i would like to aggregate data by date column.

date <- c("2020-01-10", "2020-01-10", "2020-01-10", "2020-01-10", 
"2020-01-10", "2020-01-11","2020-01-11", "2020-01-11", "2020-01-11","2020-01-11",
 "2020-01-12", "2020-01-12", "2020-01-12", "2020-01-12", "2020-01-12", 
"2020-01-13","2020-01-13", "2020-01-13", "2020-01-13","2020-01-13")
ID <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
assets <- c(1, 2, 3, 10, 9, 21, 23,1, 1, 3, 11,11, 6,12,13,5,2 ,15,12,12)
category <- c("new", "new", "new","new","new", 
"new", "new", "old", "old", "old", "old", "old", "old", "expired","expired","expired","expired"
,"expired","expired","expired")

df1 <- data.frame(ID, date, assets, category)

This code transforming the dataframe and also also adds new column total which is sum of new + old + expired

df_transformed <- df1 %>%
  #group_by(date)>
  pivot_wider(names_from = category, values_from = assets)%>%
  mutate(total = c(new + old + expired))
print(df_transformed)
      ID date         new   old expired total
   <dbl> <chr>      <dbl> <dbl>   <dbl> <dbl>
 1     1 2020-01-10     1    NA      NA    NA
 2     2 2020-01-10     2    NA      NA    NA
 3     3 2020-01-10     3    NA      NA    NA
 4     4 2020-01-10    10    NA      NA    NA
 5     5 2020-01-10     9    NA      NA    NA
 6     6 2020-01-11    21    NA      NA    NA
 7     7 2020-01-11    23    NA      NA    NA
 8     8 2020-01-11    NA     1      NA    NA
 9     9 2020-01-11    NA     1      NA    NA
10    10 2020-01-11    NA     3      NA    NA
11    11 2020-01-12    NA    11      NA    NA
12    12 2020-01-12    NA    11      NA    NA
13    13 2020-01-12    NA     6      NA    NA
14    14 2020-01-12    NA    NA      12    NA
15    15 2020-01-12    NA    NA      13    NA
16    16 2020-01-13    NA    NA       5    NA
17    17 2020-01-13    NA    NA       2    NA
18    18 2020-01-13    NA    NA      15    NA
19    19 2020-01-13    NA    NA      12    NA
20    20 2020-01-13    NA    NA      12    NA

I would like to achieve the following outcome. Original dataframe has a lots of NAs so i am hoping to get the sum for those date to be zero.

      ID date           new           old              expired         total

 1     1 2020-01-10   sum of (01-10)  sum of (01-10)  sum of (01-10)   new + old + expired 
 2     2 2020-01-11   sum of (01-11)  sum of (01-11)  sum of (01-11)   new + old + expired
 3     3 2020-01-12   sum of (01-12)  sum of (01-12)  sum of (01-12)   new + old + expired
 4     4 2020-01-13   sum of (01-13)  sum of (01-13)  sum of (01-13)   new + old + expired

1

There are 1 answers

0
iamakhilverma On

You may want to summarise across everything to handle NAs within sum() function

df_transformed |>
    group_by(date) |>
    summarise(across(everything(), .f = sum, na.rm = TRUE)) |>
    mutate(total = new + old + expired)

Output:

A tibble: 4 x 6
date    ID  new old expired total
<chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
2020-01-10  15  25  0   0   25
2020-01-11  40  44  5   0   49
2020-01-12  65  0   28  25  53
2020-01-13  90  0   0   46  46