I'm trying to find a way to separate out the various intervals in my data so that each row associated with an ID has it's min and max duration, broken up by the months that are NA.
I have data that looks like this, but with 9 columns and 275 rows
df <- data.frame("ID" = c(1:5),
"jan" = c("2020-01-01",NA, "2020-01-01", "2020-01-01", "2020-01-01"),
"feb" = c("2020-02-01", "2020-02-01", NA, "2020-02-01", "2020-02-01"),
"mar" = c("2020-03-01", "2020-03-01", NA, "2020-03-01", NA),
"apr" = c(NA, "2020-04-01", NA, "2020-04-01", "2020-04-01"),
"may" = c("2020-05-01", "2020-05-01", NA ,NA, "2020-05-01"),
"jun" = c("2020-06-01", "2020-06-01", "2020-06-01", NA, NA)
)
Ideally so the columns would be something like:
ID Start1 Stop1 Start2 Stop2
1 "2020-01-01" "2020-03-01" "2020-05-01" "2020-06-01"
....
EDIT: I've edited the requirements for this, as A) it was marked as duplicate despite the duplicate problem only being tangentially related, B) Because I really was looking for a tidyverse solution - which is what I got
You can get the data in long format, create a new start and stop group at each
NA
value. For each group selectfirst
andlast
date value and get data in wide format.