New variable calculation with input from multple groups in long format

Question

New variable calculation with input from multple groups in long format

54 views Asked by KidLu At 26 April 2022 at 16:55

I was wondering whether the following calculation is possible using dplyr without transforming my data into wide format. My data looks like the following:

data <- data.frame(ID = c(rep(1:2, 6)),
                   Date = c(rep(as.Date('2022-03-01'), 4), rep(as.Date('2022-03-02'), 4), rep(as.Date('2022-03-03'), 4)),
                   Type = rep(LETTERS[c(1,1,2,2)], 3),
                   Value = c(1,2,101,102,3,4,103,104,5,6,105,106))

My goal is to make a calculation, which involves the value of a certain day from type B, but as well the value from the previous day of type A AND type B. If the calculation would only be within one group, then dplyr::lag is the way to go. But I do not see the way in this case. I'd like to avoid pivoting my data into wide format.

So as an example, I'd like to calculate X = B(t) - A(t-1) * B(t-1), where t is denoting the date. My goal in this case would be something like the following dataframe:

data_goal <- data.frame(ID = c(rep(1:2, 3)),
                        Date = c(rep(as.Date('2022-03-01'), 2), rep(as.Date('2022-03-02'), 2), rep(as.Date('2022-03-03'), 2)),
                        X = c(NA, NA, 103 - 1 * 101, 104 - 2 * 102, 105 - 3 * 103, 106 - 6 * 104))

If I would calculate the daily difference for each type on its own, my solution would be

data |>
  dplyr::arrange(Date) |>
  dplyr::group_by(ID, Type) |>
  dplyr::mutate(Diff = Value - dplyr::lag(Value, n = 1))

But unfortunately I have no idea how I might extend this.

Any help is highly appreciated!

Thanks a lot!

Note that I am also glad to know, if this is not possible. Then I would move on to transforming the table into wide format and continue from there. My actual data has a lot more types, which is why I'd like to avoid that.

Original Q&A

There are 1 answers

**Yuriy Saraykin** · Accepted Answer · 2022-04-26T17:54:15+00:00

it may be useful

data <- data.frame(
  ID = c(rep(1:2, 6)),
  Date = c(rep(as.Date('2022-03-01'), 4), rep(as.Date('2022-03-02'), 4), rep(as.Date('2022-03-03'), 4)),
  Type = rep(LETTERS[c(1, 1, 2, 2)], 3),
  Value = c(1, 2, 101, 102, 3, 4, 103, 104, 5, 6, 105, 106)
)

library(tidyverse)

data %>%
  group_by(Date) %>%
  mutate(grp = cur_group_id()) %>%
  ungroup() %>%
  summarise(Diff = map(.x = seq(max(grp)),
                       .f = ~ Value[Type == "B" &
                                      grp == .x] - Value[Type == "A" &
                                                           grp == .x - 1] * Value[Type == "B" &
                                                                                    grp == .x - 1])) %>%
  unnest(Diff) %>%
  add_case(Diff = rep(NA, length(unique(data$ID))), .before = 1) %>%
  add_column(distinct(data, ID, Date), .before = 1)
#> # A tibble: 6 × 3
#>      ID Date        Diff
#>   <int> <date>     <dbl>
#> 1     1 2022-03-01    NA
#> 2     2 2022-03-01    NA
#> 3     1 2022-03-02     2
#> 4     2 2022-03-02  -100
#> 5     1 2022-03-03  -204
#> 6     2 2022-03-03  -310

^{Created on 2022-04-26 by the reprex package (v2.0.1)}

TechQA.

New variable calculation with input from multple groups in long format

There are 1 answers

Related Questions in R

Related Questions in DPLYR

Related Questions in LONG-FORMAT-DATA

Popular Questions

Popular Tags

Trending Questions