Searching for linear interpolation of time series data in R, I often found recommendations to use na.approx()
from the zoo
package.
However, with irregular timeseries I experienced problems, because interpolated values are distributed evenly across the number of gaps, not taking into account the associated time stamp of the value.
I found a work around using approxfun()
, but I wonder whether there is a cleaner solution, ideally based on tsibble
objects with functions from the tidyverts
package family?
Previous answers relied on expanding the irregular date grid to a regular grid by filling the gaps. However, this causes problems when daytime should be taken into account during interpolating.
Here comes a (revised) minimal example with POSIXct timestamp rather than Date only:
library(tidyverse)
library(zoo)
df <- tibble(date = as.POSIXct(c("2000-01-01 00:00", "2000-01-02 02:00", "2000-01-05 00:00")),
value = c(1,NA,2))
df %>%
mutate(value_int_wrong = na.approx(value),
value_int_correct = approxfun(date, value)(date))
# A tibble: 3 x 4
date value value_int_wrong value_int_correct
<dttm> <dbl> <dbl> <dbl>
1 2000-01-01 00:00:00 1 1 1
2 2000-01-02 02:00:00 NA 1.5 1.27
3 2000-01-05 00:00:00 2 2 2
Any ideas how to (efficently) deal with this? Thanks for your support!
Personally, I would go with the solution that you are using but to show how to use
na.approx
in this case we cancomplete
the sequence of dates before usingna.approx
and join it with originaldf
to keep original rows.