I'm trying to generate 'episodes' of observations, grouping together observations where they occur </=
14 days apart.
With dplyr I've managed to calculate the number of days since the last observation. However, I cannot figure out how to get a new id based on the conditional </= 14
without a for
loop.
Sample data:
#obsvn is number of days since first observation in group
dat <- data.frame(id = c(rep("A",5), rep("B", 2)),
obsvn = c(1, 2, 29, 30, 45, 1, 15))
id obsvn
1 A 1
2 A 2
3 A 29
4 A 30
5 A 45
6 B 1
7 B 15
Expected output:
id obsvn ith
1 A 1 1
2 A 2 1
3 A 29 2
4 A 30 2
5 A 45 3
6 B 1 1
7 B 15 2
I've tried using lag to
dat <- dat %>%
group_by(id) %>%
mutate(ith = 1,
ith = ifelse(obsvn - lag(obsvn) <= 14, lag(ith), lag(ith)+1))
dat
Source: local data frame [7 x 3]
Groups: id
id obsvn ith
1 A 1 NA
2 A 2 1
3 A 29 2
4 A 30 1
5 A 45 2
6 B 1 NA
7 B 15 1
Which isn't what I want. I don't understand why ith
in row 4 is 1 rather than 2.
Because it is returning
lag(ith)
, which is always 1 (or NA at the start).I would do it using
diff
andcumsum
: