R: how to resample a datetime variable at the millisecond level?

964 views Asked by At

I have a dataframe like the following

library(dplyr)
library(lubridate)
time = c('2013-01-03 22:04:21.549', '2013-01-03 22:04:21.549', '2013-01-03 22:04:21.559', '2013-01-03 22:04:23.559' )
value = c(1,2,3,4)

data <- data_frame(time, value)
data <-data %>%  mutate(time = ymd_hms(time))

# A tibble: 4 × 2
                     time value
                   <dttm> <dbl>
1 2013-01-03 22:04:21.549     1
2 2013-01-03 22:04:21.549     2
3 2013-01-03 22:04:21.559     3
4 2013-01-03 22:04:23.559     4

I would like to resample this dataframe every 200 milliseconds.

That is, take the average of value every 200 milliseconds.

I know can use lubridate::floor_date(time, '1 second') up to the second precision, but not for milliseconds.

In the example above, row 1,2, and 3 should be grouped together while row 4 should be alone (note it is 2 seconds apart from the others).

Any ideas? Thanks!

2

There are 2 answers

3
IRTFM On BEST ANSWER

The fact that your comment to the xts solution asked for this to be "plugged back in" to the dataframe made me think you either wanted a merged result or a grouped-by-time column. That's what the ave function does in base R. There's probably a dplyr equivalent, but I'm more of a base-R-guy: EDIT:

 data$ms200mn <- ave(data$value, 
                     cut( arg <- as.numeric(data$time) , 
                                breaks=seq( floor(arg[1]), ceil(arg[4]), by=0.2) ),
                     FUN=mean)
>  data
# A tibble: 4 × 3
                 time value ms200mn
               <dttm> <dbl>   <dbl>
1 2013-01-03 22:04:21     1       2
2 2013-01-03 22:04:21     2       2
3 2013-01-03 22:04:21     3       2
4 2013-01-03 22:04:23     4       4

This isn't really properly called "sampling" (or resampling), but is rather aggregation. There is no 'msec' option for the seq.POSIXt-function (so needed to convert to numeric seconds) and fractional seconds are not allowed.

Explanation of:

cut(arg <- as.numeric(data$time), breaks=seq( floor(arg[1]), ceil(arg[4]), by=0.2) )

It is "classifying" or "categorizing" items in groups defined by a sequence of breaks starting below the first item and ending above the last item. The arg-value needed to be created because (for reasons that I don't understand) raw 'datetime' variables cannot be used can be used by the seq function.

5
Joshua Ulrich On

Since you used the [xts] tag, here's an xts solution:

options(digits.secs=6)
require(xts)
x <- xts(1:4, as.POSIXct(c('2013-01-03 22:04:21.549', '2013-01-03 22:04:21.549',
                           '2013-01-03 22:04:21.559', '2013-01-03 22:04:23.559')))
period.apply(x, endpoints(x, "ms", 200), mean)
#                         [,1]
# 2013-01-03 22:04:21.559    2
# 2013-01-03 22:04:23.559    4

Starting from your data object:

x <- with(data, xts(value, time))
period.apply(x, endpoints(x, "ms", 200), mean)