Irregular time series in fable package

965 views Asked by At

In the tsibble package and fable package, I think I read somewhere that we can handle irregular time series. I could not find anything with examples on how to do it. Some questions I have are:

  1. Do I have to convert irregular timeseries to a regular one before I can model? (So far what I know is that we need to convert irregular time series to a regular one. Please let me know if its is not the case ? and if not then what are some models that do not need regular time series?)
  2. What are the tools and models in tidyverts/tsibble/ fable /fabletools to handle irregular timeseries?

Are there any questions/ links where I can see a working example ? e.g. This question uses zoo/xts to handle it.

I saw some capabilities related to that in zoo/xts, which is always good but I am spinning my wheels on fable and trying to get it to work.

for a sample dataset we can use

    DF <- structure(list(station = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L), Time = structure(c(1L, 2L, 3L, 5L, 7L, 1L, 2L, 4L, 6L, 8L
), .Label = c("01-01-1974", "01-02-1974", "01-03-1974", "01-04-1974", 
"01-05-1974", "01-06-1974", "01-07-1974", "01-08-1974"), class = "factor"), 
    WaterTemp = c(5, 5, 8.6000004, 8.1333332, 12.7999999, 5, 
    5, 8.6000004, 8.1333332, 12.7999999)), .Names = c("station", 
"Time", "WaterTemp"), class = "data.frame", row.names = c(NA, 
-10L))
1

There are 1 answers

3
Mitchell O'Hara-Wild On

Most models available in {fable} require the observations to be regular, and a lot of models also require that there are no gaps in the data. An example model which supports irregular data is fable::TSLM().

The above example data is typically considered 'regular' but with gaps. This is because the data has a common interval of 1 month, however some months are missing in the data. Here is how a tsibble for this data can be produced:

DF <- structure(list(station = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
                                 2L), Time = structure(c(1L, 2L, 3L, 5L, 7L, 1L, 2L, 4L, 6L, 8L
                                 ), .Label = c("01-01-1974", "01-02-1974", "01-03-1974", "01-04-1974", 
                                               "01-05-1974", "01-06-1974", "01-07-1974", "01-08-1974"), class = "factor"), 
                     WaterTemp = c(5, 5, 8.6000004, 8.1333332, 12.7999999, 5, 
                                   5, 8.6000004, 8.1333332, 12.7999999)), .Names = c("station", 
                                                                                     "Time", "WaterTemp"), class = "data.frame", row.names = c(NA, 
                                                                                                                                               -10L))

# Fix $Time to a valid yearmonth index variable
library(tsibble)
library(dplyr)
DF <- DF %>% 
  mutate(Time = yearmonth(as.Date(format(Time), format = "%d-%m-%Y")))
DF
#>    station     Time WaterTemp
#> 1        1 1974 Jan  5.000000
#> 2        1 1974 Feb  5.000000
#> 3        1 1974 Mar  8.600000
#> 4        1 1974 May  8.133333
#> 5        1 1974 Jul 12.800000
#> 6        2 1974 Jan  5.000000
#> 7        2 1974 Feb  5.000000
#> 8        2 1974 Apr  8.600000
#> 9        2 1974 Jun  8.133333
#> 10       2 1974 Aug 12.800000

# Create a 'regular' tsibble (with gaps)
as_tsibble(DF, key = "station", index = "Time")
#> # A tsibble: 10 x 3 [1M]
#> # Key:       station [2]
#>    station     Time WaterTemp
#>      <int>    <mth>     <dbl>
#>  1       1 1974 Jan      5   
#>  2       1 1974 Feb      5   
#>  3       1 1974 Mar      8.60
#>  4       1 1974 May      8.13
#>  5       1 1974 Jul     12.8 
#>  6       2 1974 Jan      5   
#>  7       2 1974 Feb      5   
#>  8       2 1974 Apr      8.60
#>  9       2 1974 Jun      8.13
#> 10       2 1974 Aug     12.8

To fill in the gaps of this dataset - similarly to what is shown in the linked question - you can use the tsibble::fill_gaps() function. This makes the data compatible with models that support missing values, but don't support gaps in the data such as fable::ARIMA().

# Create a 'regular' tsibble (with gaps) then complete the gaps
as_tsibble(DF, key = "station", index = "Time") %>% 
  fill_gaps()
#> # A tsibble: 15 x 3 [1M]
#> # Key:       station [2]
#>    station     Time WaterTemp
#>      <int>    <mth>     <dbl>
#>  1       1 1974 Jan      5   
#>  2       1 1974 Feb      5   
#>  3       1 1974 Mar      8.60
#>  4       1 1974 Apr     NA   
#>  5       1 1974 May      8.13
#>  6       1 1974 Jun     NA   
#>  7       1 1974 Jul     12.8 
#>  8       2 1974 Jan      5   
#>  9       2 1974 Feb      5   
#> 10       2 1974 Mar     NA   
#> 11       2 1974 Apr      8.60
#> 12       2 1974 May     NA   
#> 13       2 1974 Jun      8.13
#> 14       2 1974 Jul     NA   
#> 15       2 1974 Aug     12.8

An irregular time series can be created using regular = FALSE. This is typically useful if you're working with event data. In this case you would rarely want to fill the gaps, because there are so many.

# Create an 'irregular' tsibble (no concept of gaps)
as_tsibble(DF, key = "station", index = "Time", regular = FALSE)
#> # A tsibble: 10 x 3 [!]
#> # Key:       station [2]
#>    station     Time WaterTemp
#>      <int>    <mth>     <dbl>
#>  1       1 1974 Jan      5   
#>  2       1 1974 Feb      5   
#>  3       1 1974 Mar      8.60
#>  4       1 1974 May      8.13
#>  5       1 1974 Jul     12.8 
#>  6       2 1974 Jan      5   
#>  7       2 1974 Feb      5   
#>  8       2 1974 Apr      8.60
#>  9       2 1974 Jun      8.13
#> 10       2 1974 Aug     12.8

Created on 2021-02-09 by the reprex package (v0.3.0)