How to capture the next event based on a condition

95 views Asked by At

I have a huge data set but I will explain the problem based on an example. In the below table I have a Unique ID, I have a Date (time series for each Unique ID) and status of that Unique ID.

Unique ID   Date    Status 
        1   Jan-06  Active 
        1   Feb-06  Active 
        1   Mar-06  Not active 
        1   Apr-06  Stable 
        1   May-06  Active 
        1   Jun-06  Stable 
        1   Jul-06  Active 
        1   Aug-06  Active 
        1   Sep-06  Active 
        2   Oct-06  Active 
        2   Nov-06  Not active 
        2   Dec-06  Stable 
        2   Jan-07  Active 
        2   Feb-07  Stable 
        2   Mar-07  Active 
        2   Apr-07  Active 
        2   May-07  Active 

Result I am trying to get to is to capture the date of next event of happening (Not active or stable )

If you see below unique ID 1 was active as of Jan 2006, we need to capture when it hit not active or stable, it hit some time in March 06.

If you see below unique ID 1 was active as of May 2006, we need to capture when it hit not active or stable, it hit some time in Jun 06.

Note: No Need to add any date for IDs which are already in not active or stable

Unique ID   Date    Status  Result 
        1   Jan-06  Active  Mar-06
        1   Feb-06  Active  Apr-06
        1   Mar-06  Not active  NA
        1   Apr-06  Stable  NA 
        1   May-06  Active  Jun-06
        1   Jun-06  Stable  NA 
        1   Jul-06  Active  Always active 
        1   Aug-06  Active  Always active 
        1   Sep-06  Active  Always active 
        2   Oct-06  Active  Nov-06
        2   Nov-06  Not active  NA 
        2   Dec-06  Stable  NA 
        2   Jan-07  Active  Feb-07
        2   Feb-07  Stable  NA
        2   Mar-07  Active  Always active 
        2   Apr-07  Active  Always active 
        2   May-07  Active  Always active 
1

There are 1 answers

0
mkchi On BEST ANSWER

Generate example data:

> set.seed(1234)
> id <- rep(1:3, times=c(10,10,10))
> date <- seq(as.Date("2000/1/1"), by = "month", length.out = length(id))
> date <- format(date,  "%b-%y")
> status <- sample(c("Active", "Not Active", "Stable"), length(id), replace=TRUE)
> data.frame(id, date, status)
   id   date     status
1   1 Jan-00     Active
2   1 Feb-00 Not Active
3   1 Mar-00 Not Active
4   1 Apr-00 Not Active
5   1 May-00     Stable
6   1 Jun-00 Not Active
7   1 Jul-00     Active
8   1 Aug-00     Active
9   1 Sep-00 Not Active
10  1 Oct-00 Not Active
11  2 Nov-00     Stable
12  2 Dec-00 Not Active
13  2 Jan-01     Active
14  2 Feb-01     Stable
15  2 Mar-01     Active
16  2 Apr-01     Stable
17  2 May-01     Active
18  2 Jun-01     Active
19  2 Jul-01     Active
20  2 Aug-01     Active
21  3 Sep-01     Active
22  3 Oct-01     Active
23  3 Nov-01     Active
24  3 Dec-01     Active
25  3 Jan-02     Active
26  3 Feb-02     Stable
27  3 Mar-02 Not Active
28  3 Apr-02     Stable
29  3 May-02     Stable
30  3 Jun-02     Active

Replace the last "Active" observation for each id with "Always Active" in copies of date and status variables:

> id_last_active <- (id != c(id[-1], FALSE)) & (status == "Active")
> date2 <- as.character(date)
> date2[id_last_active] <- "Always Active"
> status2 <- status
> status2[id_last_active] <- "Always Active"

Use which() to get positions of non-"Active" observations (regardless of id), and use diff() and rep() to create a vector of positions of the next non-"Active" event for the group of prior "Active" events. Then get the date of the next non-active event for "Active" events:

> nonactive <- which(status2 != "Active")
> status_len <- c(nonactive[1], diff(nonactive))
> next_event <- date2[rep(nonactive, times = status_len)]
> next_event[status2 %in% c("Stable", "Not Active")] <- NA

Print the data by id (using by()):

> dat <- data.frame(id, date, status, next_event)
> by(dat, dat$id, function(x) x)
dat$id: 1
   id   date     status next_event
1   1 Jan-00     Active     Feb-00
2   1 Feb-00 Not Active       <NA>
3   1 Mar-00 Not Active       <NA>
4   1 Apr-00 Not Active       <NA>
5   1 May-00     Stable       <NA>
6   1 Jun-00 Not Active       <NA>
7   1 Jul-00     Active     Sep-00
8   1 Aug-00     Active     Sep-00
9   1 Sep-00 Not Active       <NA>
10  1 Oct-00 Not Active       <NA>
--------------------------------------------------------------- 
dat$id: 2
   id   date     status    next_event
11  2 Nov-00     Stable          <NA>
12  2 Dec-00 Not Active          <NA>
13  2 Jan-01     Active        Feb-01
14  2 Feb-01     Stable          <NA>
15  2 Mar-01     Active        Apr-01
16  2 Apr-01     Stable          <NA>
17  2 May-01     Active Always Active
18  2 Jun-01     Active Always Active
19  2 Jul-01     Active Always Active
20  2 Aug-01     Active Always Active
--------------------------------------------------------------- 
dat$id: 3
   id   date     status    next_event
21  3 Sep-01     Active        Feb-02
22  3 Oct-01     Active        Feb-02
23  3 Nov-01     Active        Feb-02
24  3 Dec-01     Active        Feb-02
25  3 Jan-02     Active        Feb-02
26  3 Feb-02     Stable          <NA>
27  3 Mar-02 Not Active          <NA>
28  3 Apr-02     Stable          <NA>
29  3 May-02     Stable          <NA>
30  3 Jun-02     Active Always Active