My goal is to exactly re-create the top say 3 models from the auto.arima
function in R. My example uses the following series:
> data <- c(79, 73, 102, 158, 235, 326, 216, 153, 82, 87, 94, 163, 119, 81, 143,
179, 182, 247, 248, 105, 143, 64, 45, 163, 122, 95, 47, 117, 345, 454,
596, 440, 207, 94, 100, 187, 106, 98, 66, 81, 178, 196, 279, 128, 80,
178, 112, 175, 91, 52, 35, 14, 20, 332, 386, 121, 167, 53, 94, 224,
76, 68, 42, 61, 125, 273, 431, 1901, 86, 65, 120, 227, 131, 84, 48, 153, 157)
> time_series <- ts(data, freq=12, start=c(2013, 10))
I use auto.arima
function shown below to capture the model information for the best model and see in text the top models.
> library(forecast)
> auto_mod <- auto.arima(time_series, trace=TRUE)
ARIMA(2,0,2)(1,0,1)[12] with non-zero mean : 1067.045
ARIMA(0,0,0) with non-zero mean : 1057.566
ARIMA(1,0,0)(1,0,0)[12] with non-zero mean : 1057.504
ARIMA(0,0,1)(0,0,1)[12] with non-zero mean : 1057.727
ARIMA(0,0,0) with zero mean : 1092.138
ARIMA(1,0,0) with non-zero mean : 1055.976
ARIMA(1,0,0)(0,0,1)[12] with non-zero mean : 1057.629
ARIMA(1,0,0)(1,0,1)[12] with non-zero mean : 1058.261
ARIMA(2,0,0) with non-zero mean : 1058.174
ARIMA(1,0,1) with non-zero mean : 1058.187
ARIMA(0,0,1) with non-zero mean : 1056.126
ARIMA(2,0,1) with non-zero mean : Inf
ARIMA(1,0,0) with zero mean : 1070.847
Best model: ARIMA(1,0,0) with non-zero mean
> summary(auto_mod)
Series: time_series
ARIMA(1,0,0) with non-zero mean
Coefficients:
ar1 mean
0.2170 176.2688
s.e. 0.1105 32.0040
sigma^2 estimated as 49991: log likelihood=-524.82
AIC=1055.65 AICc=1055.98 BIC=1062.68
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set 0.3042569 220.664 104.5716 -76.32587 96.76548 1.051865
ACF1
Training set 0.006605514
I thought I could use the text output to exactly re-create the models other than the top 1 but I have realized that the description of the models in the text output comes close, but does not seem to exactly, identify each model uniquely. I think it gives information about if there is an intercept or if there is drift, but not both. I've found one such example below where two different models, one with an intercept and one without, have the same label.
> mod1 <- Arima(time_series, order=c(1,0,0), seasonal=c(1,0,0), include.drift=TRUE,
+ include.mean=TRUE)
> summary(mod1)
Series: time_series
ARIMA(1,0,0)(1,0,0)[12] with drift
Coefficients:
ar1 sar1 intercept drift
0.1722 0.1976 132.4150 1.2188
s.e. 0.1175 0.2059 68.9234 1.5233
sigma^2 estimated as 50174: log likelihood=-524.15
AIC=1058.31 AICc=1059.15 BIC=1070.03
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set -0.05272209 218.099 99.62455 -76.12583 94.97947 1.002104
ACF1
Training set 0.004756765
> mod2 <- Arima(time_series, order=c(1,0,0), seasonal=c(1,0,0), include.drift=TRUE,
+ include.mean=FALSE)
> summary(mod2)
Series: time_series
ARIMA(1,0,0)(1,0,0)[12] with drift
Coefficients:
ar1 sar1 drift
0.2014 0.2862 3.6940
s.e. 0.1205 0.1960 0.9009
sigma^2 estimated as 51238: log likelihood=-525.77
AIC=1059.53 AICc=1060.09 BIC=1068.91
Training set error measures:
ME RMSE MAE MPE MAPE MASE
Training set 19.14777 221.9052 107.1779 -57.65942 96.53528 1.078082
ACF1
Training set -0.01092874
So in summary my question is: Is there a way to use an auto-fitting process in R to get the top x arima models? I'm open to any solution but my order of preference for a solution would be:
- A way to pull model objects for top x models out of
auto.arima
- A way to pull the information on top x models in text or understand the trace argument better in order to back into creating the models myself.
- Some other package or functionality
Thanks in advance for any help that can be offered!
The information from the
forecast::auto.arima
trace uniquely determines the model.Let's explicitly verify this for the third model (using your sample data):
We use base R's
arima
to manually fit the same modelWhen using ML to estimate model parameters,
arima
returns the AIC value; the trace offorecast::auto.arima
on the other hand returns the corrected (for small sample size) AIC. Thanks to this post on Cross Validated, we can write a custom function that calculates the AICc based on the model parameters.We confirm that the corrected AIC is the same as the one reported in the trace of
forecast::auto.arima
:forecast::Arima
automatically returns both AIC and AICc values, and fit results coincide with those fromarima
(unsurprisingly, asforecast::Arima
internally callsarima
) andforecast::auto.arima
:In comparison to base R's
arima
,forecast::Arima
allows for more flexibility on the inclusion of constants: whilearima
allows for a constant offset throughinclude.mean
,forecast::Arima
also allows for a time-dependent (deterministic) drift parameter. For more details, see Constants and ARIMA models in R.I'm not aware of an already-available way to obtain a list of top n model fits from
forecast::auto.arima
. The trace only prints intermediate results from the search in parameter space to the console. In the past, I have defined & written my own parameter grid search of SARIMA model parameters to determine an optimal fit model based on one of the information criteria; however,forecast::auto.arima
is more sophisticated and also considers (i.e. excludes) model fits with unit roots close to the unit circle. Your best bet is to either define your own grid search and have it return a list of the top n models; or modify the code offorecast::auto.arima
to return such a list; or (probably the least elegant solution) capture and parse the output ofauto.arima(... trace = TRUE)
to extract model parameters and re-fit usingarima
orforecast::Arima
.Update 1
I did say that capturing the output of
auto.arima
was probably the least elegant method, but this is what I did end up doing; perhaps this is helpful to you as a starting point:Let's define a function that parses an ARIMA model string as generated from
auto.arima
Then we can use
capture.output
to capture the trace offorecast::auto.arima
and parse the model stringsNow that you have a
data.frame
/tibble
with the model parameters it's straightforward to use them inarima
orforecast::Arima
.Update 2
Regarding the identifiability of a SARIMA model based on the
auto.arima
model string, let's take a look at some specific models.With differencing
Model 1 is an ARMA model with a constant
Model 2 is an ARIMA model with drift
We can expand the model to get
Model 3 is an ARIMA model with drift and a constant
When we expand model 3 we notice that the constant term cancels due to the differencing, and we end up with the same model 2.
So
d=1
models with drift and drift+constant describe the same process. In this case, reporting a model as e.g. "ARIMA(1,1,0) with drift" would refer to the same model as saying "ARIMA(1,1,0) with drift and non-zero mean".No differencing
For
d=0
, usinginclude.drift = TRUE
by default results in a drift and constant offset parameter, see e.g. Constants and ARIMA models in R. If you specifyinclude.drift = TRUE
and then manually forceinclude.mean = FALSE
you get a different model (as you show inmod1
andmod2
). From what I understand, this will never happen inforecast::auto.arima
, whereinclude.drift = TRUE
implicitly always hasinclude.mean = TRUE
. So a SARIMA stringARIMA(1,0,0)(1,0,0)[12] with drift
fromforecast::auto.arima
would translate to a SARIMA(1,0,0)(1,0,0)[12] model with drift and offset.