How to write an R function to pivot the last n minutes?

42 views Asked by At

I have per-minute data that looks like this, with 30 different unique IDs. Each row contains the per-minute measurement, and a value of 1 if the measurement was verified by a technician.

ID dttm Measurement Verified?
1 1-1-2023 08:00 89 1
1 1-1-2023 08:01 90 0
1 1-1-2023 08:02 96 0
1 1-1-2023 08:03 92 0
1 1-1-2023 08:04 98 0
1 1-1-2023 08:05 88 0
1 1-1-2023 08:06 86 0
1 1-1-2023 08:07 91 0
1 1-1-2023 08:08 94 0
1 1-1-2023 08:09 100 0
1 1-1-2023 08:10 81 0
1 1-1-2023 08:11 82 0
1 1-1-2023 08:12 95 0
1 1-1-2023 08:13 90 0
1 1-1-2023 08:14 87 0
1 1-1-2023 08:15 91 1
1 1-1-2023 08:16 89 0
1 1-1-2023 08:17 90 0
1 1-1-2023 08:18 96 0
1 1-1-2023 08:19 92 0
1 1-1-2023 08:20 98 0
1 1-1-2023 08:21 88 0
1 1-1-2023 08:22 86 0
1 1-1-2023 08:23 91 0
1 1-1-2023 08:24 94 0
1 1-1-2023 08:25 100 0
1 1-1-2023 08:26 81 0
1 1-1-2023 08:27 82 0
1 1-1-2023 08:28 95 0
1 1-1-2023 08:29 90 0
1 1-1-2023 08:30 87 0
1 1-1-2023 08:31 91 0
1 1-1-2023 08:32 89 0
1 1-1-2023 08:33 90 0
1 1-1-2023 08:34 96 0
1 1-1-2023 08:35 92 0
1 1-1-2023 08:36 98 0
1 1-1-2023 08:37 88 0
1 1-1-2023 08:38 86 0
1 1-1-2023 08:39 91 0
1 1-1-2023 08:40 94 0
1 1-1-2023 08:41 100 0
1 1-1-2023 08:42 81 0
1 1-1-2023 08:43 82 0
1 1-1-2023 08:44 95 0
1 1-1-2023 08:45 90 0
1 1-1-2023 08:46 87 0
1 1-1-2023 08:47 91 0
1 1-1-2023 08:48 89 0
1 1-1-2023 08:49 90 0
1 1-1-2023 08:50 96 0
1 1-1-2023 08:51 92 0
1 1-1-2023 08:52 98 0
1 1-1-2023 08:53 88 0
1 1-1-2023 08:54 86 0
1 1-1-2023 08:55 91 0
1 1-1-2023 08:56 94 0
1 1-1-2023 08:57 100 0
1 1-1-2023 08:58 81 0
1 1-1-2023 08:59 82 0
1 1-1-2023 09:00 95 1
1 1-1-2023 09:01 90 0
1 1-1-2023 09:02 87 0
1 1-1-2023 09:03 91 0

And so on, for a total of 2880 minutes per ID.

What I am looking to achieve is to pivot the previous 59 minutes around verified values so that there is a column for each of the preceding 59 minutes.

For example, since there are 59 minutes preceeding the value at 1-1-2023 09:00 for ID 1, the result would be:

ID Verified_Time min_1 min_2 min_3 min_4 min_5 min_6 min_7 min_8 ... min_59 verified
1 1-1-2023 09:00 90 96 92 98 88 86 91 94 ... 82 95

I understand how to do a simple pivot wider around a column, but I am not sure how to pivot if the value is verified and there are 59 preceeding values.

1

There are 1 answers

0
Jan On BEST ANSWER

Since you know how to pivot_wider around a column, we only need to construct something like a list whose entries are the subsets of the data around the relevant dates.

The relevant dates are those who are verified and have 59 preceding values inside df. They can be obtained by the following. In my example I used your data and additionally set the date at 09:02 as verified.

library(dplyr)

df |>
  filter(
    `Verified?` == 1 &
      dttm - as.difftime(59, units = "mins") > min(dttm),
    .by = ID
  ) |>
  pull(dttm)

# [1] "2023-01-01 09:00:00 CET" "2023-01-01 09:02:00 CET"

For such a VerifiedDate, the relevant data subset can be obtained by

df |> filter(between(
  dttm,
  VerifiedDate - as.difftime(59, units = "mins"),
  VerifiedDate
))

Putting it together using purrr::map yields the desired list of dataframes. On each of the dataframes you can now use pivoting in order to get the desired output.

purrr::map(df |>
             filter(
               `Verified?` == 1 &
                 dttm - as.difftime(59, units = "mins") > min(dttm),
               .by = ID
             ) |>
             pull(dttm)
           , function (VerifiedDate) {
             df |> filter(between(
               dttm,
               VerifiedDate - as.difftime(59, units = "mins"),
               VerifiedDate
             ))
           })

#[[1]]
# A tibble: 60 × 4
#      ID dttm                Measurement `Verified?`
#   <int> <dttm>                    <int>       <int>
# 1     1 2023-01-01 08:01:00          90           0

...

#[[2]]
# A tibble: 60 × 4
#      ID dttm                Measurement `Verified?`
#   <int> <dttm>                    <int>       <int>
# 1     1 2023-01-01 08:03:00          92           0

Used data:

> dput(df)
structure(list(ID = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L), dttm = structure(c(1672556400, 1672556460, 
1672556520, 1672556580, 1672556640, 1672556700, 1672556760, 1672556820, 
1672556880, 1672556940, 1672557000, 1672557060, 1672557120, 1672557180, 
1672557240, 1672557300, 1672557360, 1672557420, 1672557480, 1672557540, 
1672557600, 1672557660, 1672557720, 1672557780, 1672557840, 1672557900, 
1672557960, 1672558020, 1672558080, 1672558140, 1672558200, 1672558260, 
1672558320, 1672558380, 1672558440, 1672558500, 1672558560, 1672558620, 
1672558680, 1672558740, 1672558800, 1672558860, 1672558920, 1672558980, 
1672559040, 1672559100, 1672559160, 1672559220, 1672559280, 1672559340, 
1672559400, 1672559460, 1672559520, 1672559580, 1672559640, 1672559700, 
1672559760, 1672559820, 1672559880, 1672559940, 1672560000, 1672560060, 
1672560120, 1672560180), class = c("POSIXct", "POSIXt"), tzone = ""), 
    Measurement = c(89L, 90L, 96L, 92L, 98L, 88L, 86L, 91L, 94L, 
    100L, 81L, 82L, 95L, 90L, 87L, 91L, 89L, 90L, 96L, 92L, 98L, 
    88L, 86L, 91L, 94L, 100L, 81L, 82L, 95L, 90L, 87L, 91L, 89L, 
    90L, 96L, 92L, 98L, 88L, 86L, 91L, 94L, 100L, 81L, 82L, 95L, 
    90L, 87L, 91L, 89L, 90L, 96L, 92L, 98L, 88L, 86L, 91L, 94L, 
    100L, 81L, 82L, 95L, 90L, 87L, 91L), `Verified?` = c(0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 
    0L, 1L, 0L)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-64L))