Modified data:
structure(list(hour = c(0L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L, 0L,
1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 0L), cs = c(0L, 0L, 0L, 0L,
0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L,
1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L
), cs_acum = c(0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 1L, 2L, 0L, 0L), cs_wanted = c(0L, 0L, 0L, 0L,
0L, 1L, 2L, 3L, 0L, 0L, 4L, 5L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L,
3L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 0L, 0L
), cs_acum2 = c(0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L, 0L, 0L, 4L, 5L,
0L, 0L, 0L, 0L, 0L, 0L, 1L, 2L, 3L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
1L, 2L, 3L, 0L, 4L, 5L, 0L, 0L)), .Names = c("hour", "cs", "cs_acum",
"cs_wanted", "cs_acum2"), class = c("data.table", "data.frame"
), row.names = c(NA, -36L), .internal.selfref = <pointer: 0x00000000001f0788>)
cs_acum
is cumulative sum of cs
with restart at 0.
df1$cs_acum <- with(df1, ave(df1$cs, cumsum(df1$cs == 0), FUN = cumsum))
I need this accumulation to continue if there is value of 1 in 5 rows of hour
after the accumulation of 1's from cs
has stopped.
Desired output is in col cs_wanted
.
Further explanation: çs_acum
is accumulation of hours (rows f cs
) that meet certain criteria. After this, it has nothing to do with cs
any more, it is then related to col: hour
. The accumulation should continue if there is a value of 1 in 5 hour window after it has stopped.
Probably a new function checking five lines in hour
from the position in cs_acum
turns to 0, would be in order, continuing accumulation from where it has stopped in cs_acum
.
Possible steps:
find position where accumulation stops
look at next five rows in hour
if there are values of 1, continue accumulation for that line,
look again in next five hours,
if there is no values of 1, do nothing.
New data:
df3 <- structure(list(hour = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),
cs = c(0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1),
cs_acum = c(0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13),
cs_acum2 = c(0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 0, 0, 0, 8, 9, 10, 11, 12, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28)),
.Names = c("hour", "cs", "cs_acum", "cs_acum2"), class = "data.frame", row.names = c(NA, -68L))
Using:
gives:
Explanation:
setDT(df1)
.rl <- rle(d1$hour)
andgrp := rleid(rep(rl$lengths >5 & rl$values == 0, rl$lengths))
you create a grouping variable that only changes when there are more than 5 zero's.hour == 1
and create a get the cumulative sum withcumsum(hour)
. If your the values inhour
are only1
's and0
's, you could also create a counter withseq_along
or1:.N
which will give the same result.is.na(cs_acum2), cs_acum2 := 0
you change the NA's to zero's.Update 1: For the new example data (
df2
):which gives:
The way I understood it is that the
cumsum
ofhour
is only allowed to start after the first appearance ofcs == 1
.Additional explanation:
rn = .I
you creat a rowindexnumber.df2[, .I[cs == 1]][1]
give you the rownumber wherecs == 1
for the first time.rn >= df2[, .I[cs == 1]][1]
you select only the rows from that point onward.Update 2: With regard to the latest (fourth) dataset, you could do:
which gives:
Used data
First example dataset:
Second dataset:
Fourth dataset: