I would like to check that an individual does not have any gaps in their eligibility status. I define a gap as a date_of_claim that occurs 30 days after the last elig_end_date. therefore, what I would like to do is check that each date_of_claim is no longer than the elig_end_date +30days in the row immediately preceeding. Ideally I would like an indicator that says 0 for no gap and 1 if there is a gap per person and where the gap occurs. Here is a sample df with the solution built in as 'gaps'.
names date_of_claim elig_end_date obs gaps
1 tom 2010-01-01 2010-07-01 1 NA
2 tom 2010-05-04 2010-07-01 1 0
3 tom 2010-06-01 2014-01-01 2 0
4 tom 2010-10-10 2014-01-01 2 0
5 mary 2010-03-01 2014-06-14 1 NA
6 mary 2010-05-01 2014-06-14 1 0
7 mary 2010-08-01 2014-06-14 1 0
8 mary 2010-11-01 2014-06-14 1 0
9 mary 2011-01-01 2014-06-14 1 0
10 john 2010-03-27 2011-03-01 1 NA
11 john 2010-07-01 2011-03-01 1 0
12 john 2010-11-01 2011-03-01 1 0
13 john 2011-02-01 2011-03-01 1 0
14 sue 2010-02-01 2010-04-30 1 NA
15 sue 2010-02-27 2010-04-30 1 0
16 sue 2010-03-13 2010-05-31 2 0
17 sue 2010-04-27 2010-06-30 3 0
18 sue 2010-04-27 2010-06-30 3 0
19 sue 2010-05-06 2010-08-31 4 0
20 sue 2010-06-08 2010-09-30 5 0
21 mike 2010-05-01 2010-07-30 1 NA
22 mike 2010-06-01 2010-07-30 1 0
23 mike 2010-11-12 2011-07-30 2 1
I have found this post quite useful How can I compare a value in a column to the previous one using R?, but feel that I cant use a loop as my df has 4 million rows, and I have had a lot of difficulty trying to run a loop on it already.
to this end, i think the code i need is something like this:
df$gaps<-ifelse(df$date_of_claim>=df$elig_end_date+30,1,0) ##this doesn't use the preceeding row.
I've made a clumsy attempt using this:
df$gaps<-df$date_of_claim>=df$elig_end_date[-1,]
but I get an error to say i have an incorrect number of dimensions.
all help greatly appreciated! thank you.
With four million observations I would use data.table: