removing data based on multiple date columns in R

73 views Asked by At

I have a dataframe which includes the following variables: covid alert date, vax dose 1 to vax dose 5 (these are vaccine brands), date dose 1 to date dose 5 (these are the vaccination dates).

The dataset includes all covid vaccines they have ever received however I really only want to keep the vaccine brands and dates that fall before their covid alert date. Is there a way to look at the covid alert date for each row and only keep the vaccine brands and dates that fall before that date? Anything after that date can be changed to NA.

sample_data <- tibble(`covid alert date` = as.Date(c('2022-01-01','2022-02-01','2022-03-01','2022-04-01','2022-05-01','2022-06-01','2022-07-01','2022-08-01','2022-09-01','2022-10-01')),             
                     `vax dose 1` = c("Astrazeneca", "Moderna", "Pfizer", "Astrazeneca", "Moderna", "Pfizer", "Astrazeneca", "Moderna", "Pfizer", "Astrazeneca"), 
                     `date dose 1` = as.Date(c('2021-01-01','2021-02-01','2021-03-01','2023-04-01','2023-05-01','2021-06-01','2021-07-01','2021-08-01','2023-09-01','2023-10-01')),
                     `vax dose 2` = c("Astrazeneca", "Moderna", "Pfizer", "Astrazeneca", "Moderna", "Pfizer", NA, NA, NA, NA), 
                     `date dose 2` = as.Date(c('2021-04-01','2021-03-01','2021-05-01','2023-06-01','2023-07-01', '2021-08-01', NA, NA, NA, NA)),
                     `vax dose 3` = c("Pfizer", "Moderna", "Pfizer", "Moderna", NA, NA, NA, NA, NA, NA), 
                     `date dose 3` = as.Date(c('2022-04-01','2021-12-01','2021-12-01','2023-12-01', NA, NA, NA, NA, NA, NA)),
                     `vax dose 4` = c("Pfizer", "Moderna", NA, NA, NA, NA, NA, NA, NA, NA), 
                     `date dose 4` = as.Date(c('2022-12-01','2022-12-01', NA, NA, NA, NA, NA, NA, NA, NA)),
                     `vax dose 5` = c("Moderna", NA, NA, NA, NA, NA, NA, NA, NA, NA), 
                     `date dose 5` = as.Date(c('2023-11-01', NA, NA, NA, NA, NA, NA, NA, NA, NA)))

I am very new to using R so any assistance is greatly appreciated. I have included some sample data above. Thanks in advance.

2

There are 2 answers

0
Lennyy On

With this approach you first make all dates NA that are after covid alert date, and when date dose X is NA, you then make the corresponding vax dose X NA as well

library(tidyverse)
sample_data |> 
  mutate(across(starts_with("date"), \(x) as.Date(if_else(x < `covid alert date`, x, NA))),
         across(starts_with("vax"), \(x) if_else(is.na(get(str_replace(cur_column(), "vax", "date"))), NA, x)))

   `covid alert date` `vax dose 1` `date dose 1` `vax dose 2` `date dose 2` `vax dose 3` `date dose 3` `vax dose 4` `date dose 4` `vax dose 5` `date dose 5`
   <date>             <chr>        <date>        <chr>        <date>        <chr>        <date>        <chr>        <date>        <chr>        <date>       
 1 2022-01-01         Astrazeneca  2021-01-01    Astrazeneca  2021-04-01    NA           NA            NA           NA            NA           NA           
 2 2022-02-01         Moderna      2021-02-01    Moderna      2021-03-01    Moderna      2021-12-01    NA           NA            NA           NA           
 3 2022-03-01         Pfizer       2021-03-01    Pfizer       2021-05-01    Pfizer       2021-12-01    NA           NA            NA           NA           
 4 2022-04-01         NA           NA            NA           NA            NA           NA            NA           NA            NA           NA           
 5 2022-05-01         NA           NA            NA           NA            NA           NA            NA           NA            NA           NA           
 6 2022-06-01         Pfizer       2021-06-01    Pfizer       2021-08-01    NA           NA            NA           NA            NA           NA           
 7 2022-07-01         Astrazeneca  2021-07-01    NA           NA            NA           NA            NA           NA            NA           NA           
 8 2022-08-01         Moderna      2021-08-01    NA           NA            NA           NA            NA           NA            NA           NA           
 9 2022-09-01         NA           NA            NA           NA            NA           NA            NA           NA            NA           NA           
10 2022-10-01         NA           NA            NA           NA            NA           NA            NA           NA            NA           NA 
0
Friede On

The dataset includes all covid vaccines they have ever received however I really only want to keep the vaccine brands and dates that fall before their 'covid alert date'.

If base R is an option and reshape not on your list of favorite functions, here a dubious approach based on the principle split-(l)apply-combine.

cp = \(v1, v2, operator) { s = getFunction(operator)(v1, v2); s[is.na(s)] = FALSE; s }
cbind(xyzzy[1L], 
      lapply(split.default(xyzzy[-1L], gl(ncol(xyzzy[-1L]) / 2L, 2L)), 
             \(x) { x[cp(x[grep("date", colnames(x))], xyzzy[1L], ">"), ] = NA; x }) |> 
        do.call(what = cbind))

gives

   covid.alert.date 1.vax.dose.1 1.date.dose.1 2.vax.dose.2
1        2022-01-01  Astrazeneca    2021-01-01  Astrazeneca
2        2022-02-01      Moderna    2021-02-01      Moderna
3        2022-03-01       Pfizer    2021-03-01       Pfizer
4        2022-04-01         <NA>          <NA>         <NA>
5        2022-05-01         <NA>          <NA>         <NA>
6        2022-06-01       Pfizer    2021-06-01       Pfizer
7        2022-07-01  Astrazeneca    2021-07-01         <NA>
8        2022-08-01      Moderna    2021-08-01         <NA>
9        2022-09-01         <NA>          <NA>         <NA>
10       2022-10-01         <NA>          <NA>         <NA>
   2.date.dose.2 3.vax.dose.3 3.date.dose.3 4.vax.dose.4 4.date.dose.4
1     2021-04-01         <NA>          <NA>         <NA>          <NA>
2     2021-03-01      Moderna    2021-12-01         <NA>          <NA>
3     2021-05-01       Pfizer    2021-12-01         <NA>          <NA>
4           <NA>         <NA>          <NA>         <NA>          <NA>
5           <NA>         <NA>          <NA>         <NA>          <NA>
6     2021-08-01         <NA>          <NA>         <NA>          <NA>
7           <NA>         <NA>          <NA>         <NA>          <NA>
8           <NA>         <NA>          <NA>         <NA>          <NA>
9           <NA>         <NA>          <NA>         <NA>          <NA>
10          <NA>         <NA>          <NA>         <NA>          <NA>
   5.vax.dose.5 5.date.dose.5
1          <NA>          <NA>
2          <NA>          <NA>
3          <NA>          <NA>
4          <NA>          <NA>
5          <NA>          <NA>
6          <NA>          <NA>
7          <NA>          <NA>
8          <NA>          <NA>
9          <NA>          <NA>
10         <NA>          <NA>

Data:

xyzzy = data.frame(`covid alert date` = as.Date(c('2022-01-01','2022-02-01','2022-03-01','2022-04-01','2022-05-01','2022-06-01','2022-07-01','2022-08-01','2022-09-01','2022-10-01')),             
                              `vax dose 1` = c("Astrazeneca", "Moderna", "Pfizer", "Astrazeneca", "Moderna", "Pfizer", "Astrazeneca", "Moderna", "Pfizer", "Astrazeneca"), 
                              `date dose 1` = as.Date(c('2021-01-01','2021-02-01','2021-03-01','2023-04-01','2023-05-01','2021-06-01','2021-07-01','2021-08-01','2023-09-01','2023-10-01')),
                              `vax dose 2` = c("Astrazeneca", "Moderna", "Pfizer", "Astrazeneca", "Moderna", "Pfizer", NA, NA, NA, NA), 
                              `date dose 2` = as.Date(c('2021-04-01','2021-03-01','2021-05-01','2023-06-01','2023-07-01', '2021-08-01', NA, NA, NA, NA)),
                              `vax dose 3` = c("Pfizer", "Moderna", "Pfizer", "Moderna", NA, NA, NA, NA, NA, NA), 
                              `date dose 3` = as.Date(c('2022-04-01','2021-12-01','2021-12-01','2023-12-01', NA, NA, NA, NA, NA, NA)),
                              `vax dose 4` = c("Pfizer", "Moderna", NA, NA, NA, NA, NA, NA, NA, NA), 
                              `date dose 4` = as.Date(c('2022-12-01','2022-12-01', NA, NA, NA, NA, NA, NA, NA, NA)),
                              `vax dose 5` = c("Moderna", NA, NA, NA, NA, NA, NA, NA, NA, NA), 
                              `date dose 5` = as.Date(c('2023-11-01', NA, NA, NA, NA, NA, NA, NA, NA, NA)))