I want to create a survival dataset featuring multiple-record ids. The existing event data consists of one row observations with the date formatted as dd/mm/yy
. The idea is to count the number of consecutive months where there is at least one event/month (there are multiple years, so this has to be accounted for somehow). In other words, I want to create episodes that capture such monthly streaks, including periods of inactivity. To give an example, the code should transform something like this:
df1
id event.date
group1 01/01/16
group1 05/02/16
group1 07/03/16
group1 10/06/16
group1 12/09/16
to this:
df2
id t0 t1 ep.no ep.t ep.type
group1 1 3 1 3 1
group1 4 5 2 2 0
group1 6 6 3 1 1
group1 7 8 4 2 0
group1 9 9 5 1 1
group1 10 ... ... ... ...
where t0
and t1
are the start and end months, ep.no
is the episode counter for the particular id
, ep.t
is the length of that particular episode, and ep.type
indicates the type of episode (active/inactive). In the example above, there is an initial three-months of activity, then a two-month break, followed by a single-month episode of relapse etc.
I am mostly concerned about the transformation that brings about the t0
and t1
from df1
to df2
, as the other variables in df2
can be constructed afterwards based on them (e.g. no is a counter, time is arithmetic, and type always starts out as 1 and alternates). Given the complexity of the problem (at least for me), I get the need to provide the actual data, but I am not sure if that is allowed? I will see what I can do if a mod chimes in.
I think this does what you want. The trick is identifying the sequence of observations that need to be treated together, and using
dplyr::lag
withcumsum
is the way to go.Gives
Using
complete()
withmon=1:12
will always make the last episode stretch to the end of that year. The solution would be to insert afilter()
on yr and mon aftercomplete()
The advantage of keeping t0 and t1 as Date-time objects is that they work correctly across year boundaries, which using month numbers won't.
Session information: