I have the following dataset:
| tier | value | begin_ms | end_ms | reaction |
|---|---|---|---|---|
| ortho | is new | 262432 | 362232 | 5 |
| words | is | 262432 | 263000 | 30 |
| metric | A | 262432 | 263000 | 30 |
| words | new | 263000 | 362232 | 25 |
| metric | B | 263000 | 362232 | 25 |
I was trying to create a new data frame in a more tidy fashion, in which I would have each occurrence of the column ortho and the occurences within the same begin_ms and end_ms as columns. I tried to use
data_spread <- spread(dfgs_final, key = tier, value = value)
but it only partially worked, looking like this:
| ortho | begin_ms | end_ms | words | metric | reaction |
|---|---|---|---|---|---|
| is new | 262432 | 362232 | 5 | ||
| 262432 | 263000 | is | A | 30 | |
| 263000 | 362232 | new | B | 25 |
Is there a way to group everything that is within the begin_ms and end_ms of the ortho column? I have something like this in mind:
| ortho | begin_ms | end_ms | words | metric | reaction |
|---|---|---|---|---|---|
| is new | 262432 | 362232 | 5 | ||
| is new | 262432 | 263000 | is | A | 30 |
| is new | 263000 | 362232 | new | B | 25 |
FYI,
spreadhas been retired/superseded since Aug 2019 (4.5 years ago) and its replacementpivot_wideris much more powerful, I suggest you migrate to that. The equivalent code is here, plus I'm adding anidfield for the second code block.From here, we can do a range-based join on the rows with
is.na(ortho)on those that are notNA, reassignortho, and combine back with the data.In the end, it might be preferred for you to have unique
idfields for each row; if you want to retain the "parent"id, we can do that with little change.Data