how to aggregate data that occured within the same time window?

Question

how to aggregate data that occured within the same time window?

44 views Asked by cambar At 12 February 2024 at 17:17

I have the following dataset:

tier	value	begin_ms	end_ms	reaction
ortho	is new	262432	362232	5
words	is	262432	263000	30
metric	A	262432	263000	30
words	new	263000	362232	25
metric	B	263000	362232	25

I was trying to create a new data frame in a more tidy fashion, in which I would have each occurrence of the column ortho and the occurences within the same begin_ms and end_ms as columns. I tried to use

data_spread <- spread(dfgs_final, key = tier, value = value)

but it only partially worked, looking like this:

ortho	begin_ms	end_ms	words	metric	reaction
is new	262432	362232			5
	262432	263000	is	A	30
	263000	362232	new	B	25

Is there a way to group everything that is within the begin_ms and end_ms of the ortho column? I have something like this in mind:

ortho	begin_ms	end_ms	words	metric	reaction
is new	262432	362232			5
is new	262432	263000	is	A	30
is new	263000	362232	new	B	25

Original Q&A

There are 1 answers

**r2evans** · Accepted Answer · 2024-02-12T17:32:25+00:00

FYI, spread has been retired/superseded since Aug 2019 (4.5 years ago) and its replacement pivot_wider is much more powerful, I suggest you migrate to that. The equivalent code is here, plus I'm adding an id field for the second code block.

library(dplyr)
library(tidyr)
wide <- pivot_wider(quux, id_cols = c("begin_ms", "end_ms", "reaction"),
                    names_from = "tier", values_from = "value") |>
  mutate(id = row_number())
wide
# # A tibble: 3 × 7
#   begin_ms end_ms reaction ortho  words metric    id
#      <int>  <int>    <int> <chr>  <chr> <chr>  <int>
# 1   262432 362232        5 is new NA    NA         1
# 2   262432 263000       30 NA     is    A          2
# 3   263000 362232       25 NA     new   B          3

From here, we can do a range-based join on the rows with is.na(ortho) on those that are not NA, reassign ortho, and combine back with the data.

filter(wide, is.na(ortho)) |>
  right_join(filter(wide, !is.na(ortho)),
             join_by(between(x$begin_ms, y$begin_ms, y$end_ms),
                     between(x$end_ms, y$begin_ms, y$end_ms)),
             suffix = c("", ".y")) |>
  mutate(ortho = ortho.y[id.y]) |>
  select(id, ortho) |>
  right_join(wide, by = "id", suffix = c(".y", "")) |>
  mutate(ortho = coalesce(ortho, ortho.y)) |>
  select(-ortho.y) |>
  arrange(id)
# # A tibble: 3 × 7
#      id begin_ms end_ms reaction ortho  words metric
#   <int>    <int>  <int>    <int> <chr>  <chr> <chr> 
# 1     1   262432 362232        5 is new NA    NA    
# 2     2   262432 263000       30 is new is    A     
# 3     3   263000 362232       25 is new new   B

In the end, it might be preferred for you to have unique id fields for each row; if you want to retain the "parent" id, we can do that with little change.

filter(wide, is.na(ortho)) |>
  right_join(filter(wide, !is.na(ortho)),
             join_by(between(x$begin_ms, y$begin_ms, y$end_ms),
                     between(x$end_ms, y$begin_ms, y$end_ms)),
             suffix = c("", ".y")) |>
  mutate(ortho = ortho.y[id.y]) |>
  select(id, parentid = id.y, ortho) |>               # THE ONLY CHANGE
  right_join(wide, by = "id", suffix = c(".y", "")) |>
  mutate(ortho = coalesce(ortho, ortho.y)) |>
  select(-ortho.y) |>
  arrange(id)
# # A tibble: 3 × 8
#      id parentid begin_ms end_ms reaction ortho  words metric
#   <int>    <int>    <int>  <int>    <int> <chr>  <chr> <chr> 
# 1     1       NA   262432 362232        5 is new NA    NA    
# 2     2        1   262432 263000       30 is new is    A     
# 3     3        1   263000 362232       25 is new new   B

Data

quux <- structure(list(tier = c("ortho", "words", "metric", "words", "metric"), value = c("is new", "is", "A", "new", "B"), begin_ms = c(262432L, 262432L, 262432L, 263000L, 263000L), end_ms = c(362232L, 263000L, 263000L, 362232L, 362232L), reaction = c(5L, 30L, 30L, 25L, 25L)), class = "data.frame", row.names = c(NA, -5L))

TechQA.

how to aggregate data that occured within the same time window?

There are 1 answers

Related Questions in R

Related Questions in SPLIT

Related Questions in AGGREGATE

Related Questions in SPREAD

Popular Questions

Trending Questions