R - Timer based on column data (Condition has been true for x time) for large data sets

41 views Asked by At

Disclaimer: I'm new to R, I have searched for an answer. There are similar problems, but I'm having issues translating what I've read into something meaningful for my implementation.

I am trying to add a condition timer column, which measures the amount of time sampleCondition = TRUE. If the Condition returns to false, the timer should reset. NOTE: I'm trying to get rid of the for loop. I'm currently calculating ConditionTime in seconds unit, but it could be minutes for the sample. End result should look like this: ConditionTime

I am still learning and so far, just end up breaking everything with each attempt I have made to improve for large data sets. (approx 1mil rows) Can someone either provide a sample solution or point me in the right direction? Any help is greatly appreciated. :)

#create sample DateTime
DateTime <- c("2017-09-01 09:37:04", "2017-09-01 09:38:04", "2017-09-01 09:39:04", "2017-09-01 09:40:04", "2017-09-01 09:41:04", "2017-09-01 09:42:04", "2017-09-01 09:43:04")
#create sample condition
sampleCondition <- c(0,1,0,0,1,1,0)
#create sample DF
sampleDF <- data.frame(DateTime,sampleCondition)

#calculate the time diff from data point to data point
sampleDF$rowTimeDiff <- c(0,difftime(sampleDF$DateTime[2:length(sampleDF$DateTime)], sampleDF$DateTime[1:(length(sampleDF$DateTime)-1)] ,  units = "secs"))


#check if condition is true (else NA), check if condition was true in the last row. ConditionTime = sum of ConditionTime[previous row] and rowTimeDiff 
for (i in 1:length(sampleDF$DateTime)) {
  sampleDF$ConditionTime[i] <- ifelse(sampleDF$sampleCondition[i] == 1, 
                                      ifelse(is.na(sampleDF$ConditionTime[i-1]), sampleDF$rowTimeDiff[i], sum(sampleDF$ConditionTime[i-1], sampleDF$rowTimeDiff[i]))
                                      , NA ) 
  i <- i + 1
}

Thanks again!

EDIT: Added more data to sample for clarity.

1

There are 1 answers

2
www On BEST ANSWER

Try this:

x <- sampleDF$sampleCondition

(cumsum(x)-cummax((!x)*cumsum(x)))*60
[1]   0  60   0   0  60 120   0

Time test:

microbenchmark(
  cumsum(x)-cummax((!x)*cumsum(x))*60
)

Unit: nanoseconds
expr min    lq    mean median     uq   max neval
  60 973 989.5 1357.09   1060 1139.5 23265   100

Sample data:

sampleDF <- data.frame(
  DateTime=c("2017-09-01 09:37:04", "2017-09-01 09:38:04", "2017-09-01 09:39:04", "2017-09-01 09:40:04", "2017-09-01 09:41:04", "2017-09-01 09:42:04", "2017-09-01 09:43:04"),
  sampleCondition=c(0,1,0,0,1,1,0),
)