As described here, making one-step forecasts in the test set is a way of avoiding the inevitable increase in variance as a forecast horizon increases. Mentioned in that section are methods to perform one-step forecasts on the test set using an already-trained model, for the forecast
package. Is there a similar way of performing a one-step forecast for test data using the newer fable
package? Perhaps the new_data
parameter described here, for example handles this, but I am not sure, as the forecasts for both h = 24
and new_data = x_test
are the same below:
> library(fable)
> library(fabletools)
> x <- USAccDeaths %>%
+ as_tsibble()
> x
# A tsibble: 72 x 2 [1M]
index value
<mth> <dbl>
1 1973 Jan 9007
2 1973 Feb 8106
3 1973 Mar 8928
4 1973 Apr 9137
5 1973 May 10017
6 1973 Jun 10826
7 1973 Jul 11317
8 1973 Aug 10744
9 1973 Sep 9713
10 1973 Oct 9938
# … with 62 more rows
> x_train <- x %>% filter(year(index) < 1977)
> x_test <- x %>% filter(year(index) >= 1977)
> fit <- x_train %>% model(arima = ARIMA(log(value) ~ pdq(0, 1, 1) + PDQ(0, 1, 1)))
> fit
# A mable: 1 x 1
arima
<model>
1 <ARIMA(0,1,1)(0,1,1)[12]>
> nrow(x_test)
[1] 24
> forecast(fit, h = 24)$.mean
[1] 7778.052 7268.527 7831.507 7916.845 8769.478 9144.790 10004.816 9326.874 8172.226
[10] 8527.355 8015.100 8378.166 7692.356 7191.343 7751.466 7839.085 8686.833 9062.247
[19] 9918.487 9250.101 8108.202 8463.933 7958.667 8322.497
> forecast(fit, new_data = x_test)$.mean
[1] 7778.052 7268.527 7831.507 7916.845 8769.478 9144.790 10004.816 9326.874 8172.226
[10] 8527.355 8015.100 8378.166 7692.356 7191.343 7751.466 7839.085 8686.833 9062.247
[19] 9918.487 9250.101 8108.202 8463.933 7958.667 8322.497
Answer and code
The
model
argument available for many models in the{forecast}
package is equivalent to therefit()
method in the{fable}
package. When used with future data, it can be used to produce multiple one-step forecasts from a model.Created on 2020-10-13 by the reprex package (v0.3.0)
Explanation
The
fitted()
values of a model are one-step ahead forecasts, which can be used to evaluate 'training accuracy' performance (forecast accuracy on the training data). However there's a catch - the estimated parameters of the model are based on the entire training set and so the training accuracy is better than what can be expected (the model contains some information about the future it is fitting).The
forecast()
function is used to produce forecasts of future time points, of which the model has never seen. You can produce a single one-step ahead forecast by usingforecast(<mable>, h = 1)
. However this only produces a single forecast. Instead, we want to produce a one-step ahead forecast, add one new observation to the model, and then produce another one-step ahead forecast beyond that new observation (repeating until running out of data).This is where the
refit()
function is useful. It takes an existing model, and applies it to a new dataset. This refitting process involves computing one-step forecasts on the data (thefitted()
values). By settingreestimate = FALSE
, the model's estimated coefficients will not be updated to better suit the new 'future' data. This resolves the issue of the model coefficients containing some information about the future values we are testing the forecast accuracy with.