R create multiple ranges of dates in tidyverse

226 views Asked by At

I'm trying to find a way to separate out the various intervals in my data so that each row associated with an ID has it's min and max duration, broken up by the months that are NA.

I have data that looks like this, but with 9 columns and 275 rows

df <- data.frame("ID" = c(1:5),
 "jan" = c("2020-01-01",NA, "2020-01-01", "2020-01-01", "2020-01-01"),
 "feb" = c("2020-02-01", "2020-02-01", NA, "2020-02-01", "2020-02-01"),
 "mar" = c("2020-03-01", "2020-03-01", NA, "2020-03-01", NA),
 "apr" = c(NA, "2020-04-01", NA, "2020-04-01", "2020-04-01"), 
 "may" = c("2020-05-01", "2020-05-01", NA ,NA, "2020-05-01"),
"jun" = c("2020-06-01", "2020-06-01", "2020-06-01", NA, NA)
)

Ideally so the columns would be something like:

ID Start1        Stop1          Start2       Stop2
1  "2020-01-01" "2020-03-01"   "2020-05-01"  "2020-06-01"
....

EDIT: I've edited the requirements for this, as A) it was marked as duplicate despite the duplicate problem only being tangentially related, B) Because I really was looking for a tidyverse solution - which is what I got

1

There are 1 answers

1
Ronak Shah On BEST ANSWER

You can get the data in long format, create a new start and stop group at each NA value. For each group select first and last date value and get data in wide format.

library(dplyr)

df %>%
  tidyr::pivot_longer(cols = -ID) %>%
  group_by(ID, grp = cumsum(is.na(value))) %>%
  na.omit() %>%
  summarise(start = first(value), 
            stop = last(value)) %>%
  mutate(grp = row_number()) %>%
  pivot_wider(names_from = grp, values_from = c(start, stop)) %>%
  select(ID, order(readr::parse_number(names(.))))

#     ID start_1    stop_1     start_2    stop_2    
#  <int> <chr>      <chr>      <chr>      <chr>     
#1     1 2020-01-01 2020-03-01 2020-05-01 2020-06-01
#2     2 2020-02-01 2020-06-01 NA         NA        
#3     3 2020-01-01 2020-01-01 2020-06-01 2020-06-01
#4     4 2020-01-01 2020-04-01 NA         NA        
#5     5 2020-01-01 2020-02-01 2020-04-01 2020-05-01