I have a problem applying a function (min) to a specific repeating time-period. Basically my data looks like in that sample:

library(xts)
start <- as.POSIXct("2018-05-18 00:00")
tseq <- seq(from = start, length.out = 1440, by = "10 mins")
Measurings <- data.frame(
  Time = tseq,
  Temp = sample(10:37,1440, replace = TRUE, set.seed(seed = 10)))
)
Measurings_xts <- xts(Measurings[,-1], Measurings$Time)

with much appreciated help (here), I managed to find out that min and max functions (contrary to mean, which works right away in period.apply) must be defined by a helper function and can then be calculated for logical datetime arguments(hours, days, years...) by using this solution:

colMin <- function(x, na.rm = FALSE) {
  apply(x, 2, min, na.rm = na.rm)
}

epHours <- endpoints(Measurings_xts, "hours")
Measurings_min <- period.apply(Measurings_xts, epHours, colMin)

For meteorological analyses I need to calculate further minima for a less intuitive timespan, crossing the calendar day, that I fail to define in code:

I need to output the minimum nighttime temperature from e.g. 2018-05-18 19:00 to 2018-05-19 7:00 in the morning for each night in my dataset.

I have tried to move the timespan by manipulating(moving) the time column up or down, to include the nighttime in one calendar day. Since this solution is error-prone and doesn´t work for my real data, where some observations are missing. How do I use the POSIXct datetime and/or xts functionalities to calculate minima in this case?

2 Answers

1
Dan On Best Solutions

here is one approach that works by defining a new group for each night interval

# define the time interval, e.g. from 19:00 to 7:00
from <- 19
to <- 7
hours <- as.numeric(strftime(index(Measurings_xts), format="%H"))
y <- rle(as.numeric(findInterval(hours, c(to,from)) != 1))
y$values[c(TRUE, FALSE)] <-  cumsum(y$values[c(TRUE, FALSE)])
grp <- inverse.rle(y)
# grp is a grouping variable that is 0 for everything outside the 
# defined interval , 1 for the first night, 2 for the second...


s <- split(Measurings_xts, grp); s$`0` <- NULL
# min_value will contain the minimum value for each night interval 
min_value <- sapply(s, min)

# to see the date interval for each value
start <- sapply(s, function(x) as.character(index(x)[1]))
end <- sapply(s, function(x) as.character(index(x)[length(x)]))
data.frame(start, end, min_value)

#                start                 end   min_value
#1           2018-05-18 2018-05-18 06:50:00         10
#2  2018-05-18 19:00:00 2018-05-19 06:50:00         10
#3  2018-05-19 19:00:00 2018-05-20 06:50:00         10
#4  2018-05-20 19:00:00 2018-05-21 06:50:00         10
#5  2018-05-21 19:00:00 2018-05-22 06:50:00         10
#6  2018-05-22 19:00:00 2018-05-23 06:50:00         10
#7  2018-05-23 19:00:00 2018-05-24 06:50:00         11
#8  2018-05-24 19:00:00 2018-05-25 06:50:00         10
#9  2018-05-25 19:00:00 2018-05-26 06:50:00         10
#10 2018-05-26 19:00:00 2018-05-27 06:50:00         10
#11 2018-05-27 19:00:00 2018-05-27 23:50:00         12
1
FXQuantTrader On

You could solve this by creating your own "end points" when you use period.apply

# Choose the appropriate time ranges
z <- Measurings_xts["T19:00/T07:00"]
# Creating your own "endpoints":
epNights <- which(diff.xts(index(z), units = "mins") > 10) - 1

Subtract one off each index because the jumps are recorded at the start of the next "night interval" in the output from which().

Then add the last data point in the data set to your end points vector, and you can then use this in period.apply

epNights <- c(epNights, nrow(z))

Measurings_min <- period.apply(z, epNights, colMin)
Measurings_min
# [,1]
# 2018-05-18 07:00:00   10
# 2018-05-19 07:00:00   10
# 2018-05-20 07:00:00   10
# 2018-05-21 07:00:00   10
# 2018-05-22 07:00:00   10
# 2018-05-23 07:00:00   10
# 2018-05-24 07:00:00   11
# 2018-05-25 07:00:00   10
# 2018-05-26 07:00:00   10
# 2018-05-27 07:00:00   10
# 2018-05-27 23:50:00   12