How to divide rows of data into separate groups based on the value of a column in the data set and calculate the total time of each group?

33 views Asked by At

I have telemetry data that lists the detections of an individual whenever they have been in this zone.

For each session, I want to calculate the total time the individual was in that zone. I assume that the individual was in that zone when a detection occurred.

I want to split the rows of data into separate sessions depending on whether the "id56wtimelag" column has a value greater than 86,400.

Then, I want to be able to calculate the number of sessions and the duration of each session in which the person was here. The left column contains only the list of detections. The columns Channel..T, ag.ID, Antenna and Power can be ignored.

      Channel..T ag.ID Antenna Power                dat2 id56wtimelag
9              7    56      A0   206 2022-12-17 16:03:18      NA secs
11             7    56      A0   184 2022-12-17 16:03:31      13 secs
12             7    56      A0   182 2022-12-17 16:03:35       4 secs
13             7    56      A0   180 2022-12-17 16:03:39       4 secs
15             7    56      A0   206 2022-12-17 16:03:55      16 secs
16             7    56      A0   206 2022-12-17 16:03:59       4 secs
19             7    56      A0   169 2022-12-17 16:05:37      98 secs
20             7    56      A0   173 2022-12-17 16:05:41       4 secs
21             7    56      A0   187 2022-12-17 16:05:45       4 secs
17729          7    56      A0   100 2023-01-04 12:42:53 1543028 secs
17730          7    56      A0   103 2023-01-04 12:42:57       4 secs
17731          7    56      A0   118 2023-01-04 12:43:01       4 secs
17732          7    56      A0   103 2023-01-04 12:43:13      12 secs
17733          7    56      A0   102 2023-01-04 12:43:17       4 secs
17734          7    56      A0    96 2023-01-04 12:43:21       4 secs
17738          7    56      A0   106 2023-01-04 12:43:36      15 secs
17739          7    56      A0   108 2023-01-04 12:43:40       4 secs
17742          7    56      A0   111 2023-01-04 12:43:57      17 secs
17743          7    56      A0    95 2023-01-04 12:44:01       4 secs
17744          7    56      A0   101 2023-01-04 12:44:05       4 secs
17748          7    56      A0   106 2023-01-04 12:44:17      12 secs
17749          7    56      A0   105 2023-01-04 12:44:21       4 secs
17750          7    56      A0   105 2023-01-04 12:44:25       4 secs
17753          7    56      A0   103 2023-01-04 12:44:37      12 secs
17754          7    56      A0   100 2023-01-04 12:44:41       4 secs
17755          7    56      A0   103 2023-01-04 12:44:45       4 secs
17759          7    56      A0    96 2023-01-04 12:44:58      13 secs
17760          7    56      A0    93 2023-01-04 12:45:08      10 secs
17763          7    56      A0    95 2023-01-04 12:45:28      20 secs
17765          7    56      A0    86 2023-01-04 12:45:48      20 secs
17767          7    56      A0   103 2023-01-04 12:46:08      20 secs
17769          7    56      A0    85 2023-01-04 12:46:28      20 secs
17772          7    56      A0    89 2023-01-04 12:46:48      20 secs
17774          7    56      A0   102 2023-01-04 12:47:08      20 secs
17776          7    56      A0   109 2023-01-04 12:47:28      20 secs
17777          7    56      A0   103 2023-01-04 12:47:48      20 secs
17778          7    56      A0   102 2023-01-04 12:48:08      20 secs
17779          7    56      A0   100 2023-01-04 12:48:28      20 secs
17780          7    56      A0   107 2023-01-04 12:48:38      10 secs
17781          7    56      A0   102 2023-01-04 12:48:58      20 secs
17782          7    56      A0   100 2023-01-04 12:49:18      20 secs
17783          7    56      A0    94 2023-01-04 12:49:38      20 secs

I'm having trouble splitting the rows into separate groups/sessions based on the criteria when the "id56wtimelag" column has a value greater than 86,400. Then I would need to calculate the duration of each session, ideally contained in a list like this for this dataset:

`Session  Length of each session (seconds)
1  147
2  405`

Data

Data set in dput format.

df1 <-
  structure(list(
    Channel..T = c(7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
                   7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
                   7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 
                   7L, 7L), 
    ag.ID = c(56L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 
              56L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 
              56L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 56L, 
              56L, 56L, 56L, 56L, 56L, 56L, 56L), 
    Antenna = c("A0", "A0", "A0", 
                "A0", "A0", "A0", "A0", "A0", "A0", "A0", "A0", "A0", "A0", "A0", 
                "A0", "A0", "A0", "A0", "A0", "A0", "A0", "A0", "A0", "A0", "A0", 
                "A0", "A0", "A0", "A0", "A0", "A0", "A0", "A0", "A0", "A0", "A0", 
                "A0", "A0", "A0", "A0", "A0", "A0"), 
    Power = c(206L, 184L, 182L, 
              180L, 206L, 206L, 169L, 173L, 187L, 100L, 103L, 118L, 103L, 102L, 
              96L, 106L, 108L, 111L, 95L, 101L, 106L, 105L, 105L, 103L, 100L, 
              103L, 96L, 93L, 95L, 86L, 103L, 85L, 89L, 102L, 109L, 103L, 102L, 
              100L, 107L, 102L, 100L, 94L), 
    dat2 = c("2022-12-17 16:03:18", 
             "2022-12-17 16:03:31", "2022-12-17 16:03:35", "2022-12-17 16:03:39", 
             "2022-12-17 16:03:55", "2022-12-17 16:03:59", "2022-12-17 16:05:37", 
             "2022-12-17 16:05:41", "2022-12-17 16:05:45", "2023-01-04 12:42:53", 
             "2023-01-04 12:42:57", "2023-01-04 12:43:01", "2023-01-04 12:43:13", 
             "2023-01-04 12:43:17", "2023-01-04 12:43:21", "2023-01-04 12:43:36", 
             "2023-01-04 12:43:40", "2023-01-04 12:43:57", "2023-01-04 12:44:01", 
             "2023-01-04 12:44:05", "2023-01-04 12:44:17", "2023-01-04 12:44:21", 
             "2023-01-04 12:44:25", "2023-01-04 12:44:37", "2023-01-04 12:44:41", 
             "2023-01-04 12:44:45", "2023-01-04 12:44:58", "2023-01-04 12:45:08", 
             "2023-01-04 12:45:28", "2023-01-04 12:45:48", "2023-01-04 12:46:08", 
             "2023-01-04 12:46:28", "2023-01-04 12:46:48", "2023-01-04 12:47:08", 
             "2023-01-04 12:47:28", "2023-01-04 12:47:48", "2023-01-04 12:48:08", 
             "2023-01-04 12:48:28", "2023-01-04 12:48:38", "2023-01-04 12:48:58", 
             "2023-01-04 12:49:18", "2023-01-04 12:49:38"), 
    id56wtimelag = c("NA secs", 
                     "13 secs", "4 secs", "4 secs", "16 secs", "4 secs", "98 secs", 
                     "4 secs", "4 secs", "1543028 secs", "4 secs", "4 secs", "12 secs", 
                     "4 secs", "4 secs", "15 secs", "4 secs", "17 secs", "4 secs", 
                     "4 secs", "12 secs", "4 secs", "4 secs", "12 secs", "4 secs", 
                     "4 secs", "13 secs", "10 secs", "20 secs", "20 secs", "20 secs", 
                     "20 secs", "20 secs", "20 secs", "20 secs", "20 secs", "20 secs", 
                     "20 secs", "10 secs", "20 secs", "20 secs", "20 secs")), 
    row.names = c("9", "11", "12", "13", "15", "16", "19", "20", "21", 
                  "17729", "17730", "17731", "17732", "17733", "17734", "17738", "17739", 
                  "17742", "17743", "17744", "17748", "17749", "17750", "17753", "17754", 
                  "17755", "17759", "17760", "17763", "17765", "17767", "17769", 
                  "17772", "17774", "17776", "17777", "17778", "17779", "17780", 
                  "17781", "17782", "17783"), class = "data.frame")
1

There are 1 answers

0
Rui Barradas On

Here is a base R solution.

  1. Remove the string secs the duration column id56wtimelag and coerce it to integer;
  2. Get a logical vector of where id56wtimelag is greater than or equal to 86,400;
  3. Create Length, a copy of that column, we will need it because it contains NA's and by working on a copy, the original remains intact;
  4. assign zero wherever Length is >= 86,400;
  5. and FALSE to all NA's in Session;
  6. Now the sessions numbers. A standard cumsum trick will make a vector increasing every time Session is TRUE.

Then the sessions' durations are computed with aggregate.

df1$id56wtimelag <- gsub("[[:alpha:]]*|[[:space:]]*", "", df1$id56wtimelag) |>
  as.integer()

Session <- df1$id56wtimelag >= 86400L
Length <- df1$id56wtimelag
Length[Session] <- 0L
Session[is.na(Session)] <- FALSE
Session <- cumsum(Session) + 1L
aggregate(Length ~ Session, FUN = sum)
#>   Session Length
#> 1       1    147
#> 2       2    405

Created on 2024-03-11 with reprex v2.1.0