How can I add exogenous variables to my ARIMA model estimation while using fable package with model() extension

2.9k views Asked by At

I am trying to estimate ARIMA models for 100 different series. So I employed fabletools::model() method and fable::ARIMA() function to do that job. But I couldn't able to use my exogenous variables in model estimation.

My series has 3 different columns, first ID tag identifying the first outlet, then Date.Time tag, and finally the Sales. In addition to these variables I also have dummy variables representing hour of day and week of day.

Dummy Variables

Following the code given bellow I transformed the dataframe which contains my endegounus and exegenous variables to tstibble.

ts_forecast <- df11  %>% select(-Date) %>%
  mutate(ID = factor(ID)) %>% group_by(ID) %>% as_tsibble(index=Date.Time,key=ID)%>%tsibble::fill_gaps(Sales=0) %>%
  fabletools::model(Arima = ARIMA(Sales,stepwise = TRUE,xreg=df12))

With this code I try to forecast values for same date.time interval for multiple outlets indentified with ID factor. But, The code returns the following error.

>     Could not find an appropriate ARIMA model.
>     This is likely because automatic selection does not select models with characteristic roots that may be numerically unstable.
>     For more details, refer to https://otexts.com/fpp3/arima-r.html#plotting-the-characteristic-roots

Sales are my endogenous target var and df12 includes dummy variables representing hour and day. Some of the stores don't create sales in some specific hours so their dummy representing 01:00 AM could be equal to zero for all observation. However I don't think that would be a problem while fable uses stepwise method. I suppose, when the code sees variable with 0s it can exclude them

I am not sure what is the problem. Am I using problematic way to add xreg to the model (in ARIMA hep page it says xreg= like previous forecast package is OK) or issue is related with the second problem i mentioned dummies including "0" for all observation. If it is the second one there could be solution that can exclude all variables with constant 0 value.

I would be delighted if you can help me.

Thanks

1

There are 1 answers

0
Rob Hyndman On BEST ANSWER

Here is an example using hourly pedestrian count data.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tsibble)
library(fable)
#> Loading required package: fabletools

# tsibble with hourly data
df <- pedestrian %>%
  mutate(dow = lubridate::wday(Date, label=TRUE))
# Training data
train <- df %>% 
  filter(Date <= "2015-01-31")
# Fit models
fit <- train %>%
  model(arima = ARIMA(Count ~ season("day") + dow + pdq(2,0,0) + PDQ(0,0,0)))
# Forecast period
fcast_xregs <- df %>%
  filter(Date > "2015-01-31", Date <= "2015-02-07") 
# Forecasts
fit %>% 
  forecast(fcast_xregs)
#> # A fable: 504 x 8 [1h] <Australia/Melbourne>
#> # Key:     Sensor, .model [3]
#>    Sensor .model Date_Time                     Count  .mean Date        Time
#>    <chr>  <chr>  <dttm>                       <dist>  <dbl> <date>     <int>
#>  1 Birra… arima  2015-02-01 00:00:00  N(-67, 174024)  -67.1 2015-02-01     0
#>  2 Birra… arima  2015-02-01 01:00:00 N(-270, 250881) -270.  2015-02-01     1
#>  3 Birra… arima  2015-02-01 02:00:00 N(-286, 310672) -286.  2015-02-01     2
#>  4 Birra… arima  2015-02-01 03:00:00 N(-283, 351704) -283.  2015-02-01     3
#>  5 Birra… arima  2015-02-01 04:00:00 N(-264, 380588) -264.  2015-02-01     4
#>  6 Birra… arima  2015-02-01 05:00:00  N(-244, 4e+05) -244.  2015-02-01     5
#>  7 Birra… arima  2015-02-01 06:00:00 N(-137, 414993) -137.  2015-02-01     6
#>  8 Birra… arima  2015-02-01 07:00:00   N(93, 424929)   93.0 2015-02-01     7
#>  9 Birra… arima  2015-02-01 08:00:00  N(292, 431894)  292.  2015-02-01     8
#> 10 Birra… arima  2015-02-01 09:00:00  N(225, 436775)  225.  2015-02-01     9
#> # … with 494 more rows, and 1 more variable: dow <ord>

Created on 2020-10-09 by the reprex package (v0.3.0)

Notes:

  • You don't need to create dummy variables in R. The formula interface will handle categorical variables appropriately.
  • The season("day") special within ARIMA will generate the appropriate seasonal categorical variable, equivalent to 23 hourly dummy variables.
  • I've specified a specific ARIMA model to save computation time. But omit the pdq special to automatically select the optimal model.
  • Keep the PDQ(0,0,0) special as you don't need the ARIMA model to handle the seasonality when you are doing that with the exogenous variables.