I am attempting to keep only deids with multiple observations.
I have the below code
help <- data.frame(deid = c(1, 5, 5, 5, 5, 5, 5, 12, 12, 12, 12),
session.number = c(1, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4),
days.since.last = c(0, 0, 7, 14, 93, 5, 102, 0, 21, 104, 4))
deid session.number days.since.last
1 1 1 0
2 5 1 0
3 5 2 7
4 5 3 14
5 5 4 93
6 5 5 5
7 5 6 102
8 12 1 0
9 12 2 21
10 12 3 104
11 12 4 4
My feeble attempt was to use the group_by and then the filter( ) command
help %>% group_by(deid) %>% filter(session.number >=2)
However, it only keeps session.number's at 2 or greater. So I get rid of the deid = 1, but all the remaining deid data starts at session.number 2, and not session.number 1.
What I am trying to tell R is to keep the groups (deid) with greater than 1 observation (session.number)
Any assistance is greatly appreciated.
this should do it - you need to filter by number of observations in each group which is got using
n()
: