Something like conditional seq_along on grouped data

735 views Asked by At

I'm trying to generate 'episodes' of observations, grouping together observations where they occur </= 14 days apart. With dplyr I've managed to calculate the number of days since the last observation. However, I cannot figure out how to get a new id based on the conditional </= 14 without a for loop.

Sample data:

#obsvn is number of days since first observation in group

dat <- data.frame(id = c(rep("A",5), rep("B", 2)), 
                  obsvn = c(1, 2, 29, 30, 45, 1, 15))
  id obsvn
1  A     1
2  A     2
3  A    29
4  A    30
5  A    45
6  B     1
7  B    15

Expected output:

  id obsvn ith
1  A     1    1
2  A     2    1
3  A    29    2
4  A    30    2
5  A    45    3
6  B     1    1
7  B    15    2

I've tried using lag to

dat <- dat %>% 
  group_by(id) %>% 
  mutate(ith = 1,
         ith = ifelse(obsvn - lag(obsvn) <= 14, lag(ith), lag(ith)+1))
dat
Source: local data frame [7 x 3]
Groups: id

  id obsvn ith
1  A     1  NA
2  A     2   1
3  A    29   2
4  A    30   1
5  A    45   2
6  B     1  NA
7  B    15   1

Which isn't what I want. I don't understand why ith in row 4 is 1 rather than 2.

1

There are 1 answers

2
James On BEST ANSWER

Because it is returning lag(ith), which is always 1 (or NA at the start).

I would do it using diff and cumsum:

dat %>% group_by(id) %>% mutate(ith = cumsum(c(1,diff(obsvn)>=14)))
Source: local data frame [7 x 3]
Groups: id

  id obsvn ith
1  A     1   1
2  A     2   1
3  A    29   2
4  A    30   2
5  A    45   3
6  B     1   1
7  B    15   2