ARIMA model with non-normal error

3k views Asked by At

I'm fitting ARIMA(0,0,1) model in R with one exogenous variable.

After fitting, I tested the error term and it's highly non-normal (it's like t-distributed error): enter image description here

My question is: is there any package in R that can fit ARIMA model with t-distributed error? Or it there any other remedy to this problem?

The data is log-transformed data already so I guess I cannot perform another data transformation.

Thank you for your help in advance!


Here is the data:

dput(x)
c(1.098612289, 0, 1.791759469, 1.386294361, 0, 2.079441542, 2.772588722, 
2.564949357, 3.737669618, 3.761200116, 3.891820298, 3.555348061, 
2.944438979, 2.772588722, 1.791759469, 2.772588722, 2.564949357, 
3.258096538, 3.295836866, 2.890371758, 2.772588722, 2.197224577, 
4.077537444, 4.828313737, 5.855071922, 6.620073207, 7.561641746, 
7.887208586, 7.557472902, 6.747586527, 5.583496309, 4.465908119, 
3.526360525, 2.890371758, 2.564949357, 2.397895273, 2.302585093, 
0.693147181, 1.386294361, 0.693147181, 0.693147181, 0, 0, 1.098612289, 
0.693147181, 0, 0, 0, 0, 0, 0, 0, 0.693147181, 0.693147181, 0, 
0, 0.693147181, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0.693147181, 0, 0.693147181, 0.693147181, 1.386294361, 
0.693147181, 1.098612289, 2.564949357, 3.555348061, 4.744932128, 
4.615120517, 4.934473933, 4.779123493, 5.308267697, 5.303304908, 
5.416100402, 5.379897354, 5.153291594, 5.081404365, 4.927253685, 
4.86753445, 4.356708827, 4.060443011, 3.891820298, 3.091042453, 
3.091042453, 2.995732274, 2.302585093, 2.079441542, 1.609437912, 
0.693147181, 0, 0)

dput(y)
c(-2.760818612, -0.969058209, -1.374522756, -2.760817117, -0.681374268, 
0.011775716, -0.195861406, 0.976866516, 1.000404862, 1.131034014, 
0.794568131, 0.183662413, 0.011814959, -0.96901336, 0.011818696, 
-0.195818426, 0.497333426, 0.535078613, 0.129616682, 0.01183645, 
-0.5635262, 1.316797505, 2.067596972, 3.094420195, 3.859561475, 
4.801489346, 5.127554079, 4.798176537, 3.988449441, 2.824408827, 
1.706836735, 0.767295318, 0.131309734, -0.19411042, -0.361162633, 
-0.456471128, -2.065908853, -1.372761111, -2.065908104, -2.065907917, 
-2.759055098, -2.759055098, -1.660442435, -2.065907356, -2.759054536, 
-2.759054536, -2.759054536, -2.759054536, -2.759054536, -2.759054536, 
-2.759054536, -2.065907168, -2.065906981, -2.759054162, -2.759054162, 
-2.065906794, -2.759053975, -2.759053975, -2.759053975, -2.759053975, 
-2.759053975, -2.759053975, -2.759053975, -2.759053975, -2.759053975, 
-2.759053975, -2.759053975, -2.759053975, -2.759053975, -2.759053975, 
-2.759053975, -2.759053975, -2.759053975, -2.759053975, -2.065906607, 
-2.759053787, -2.06590642, -2.065906232, -1.37275849, -2.065905484, 
-1.660440001, -0.194100686, 0.796304383, 1.985909791, 1.856116899, 
2.17549615, 2.020167801, 2.549349637, 2.544424292, 2.657261726, 
2.621099122, 2.394525569, 2.3226683, 2.168543275, 2.108848197, 
1.598036993, 1.301781851, 1.133168127, 0.332394215, 0.332398148, 
0.237091526, -0.456053969, -0.679196209, -1.149199089, -2.065489634, 
-2.758636814, -2.758636814, -2.758636814)

And my code:

y1 = y
x_data1 = matrix(c(x), ncol = 1)
ts_mod1 = arima(y1, order = c(0,0,1), xreg = x_data1)
ts_res1 = ts_mod1$residuals

qqnorm(ts_res1, main = "", cex.axis = 1.2, cex.lab = 1.45)
qqline(ts_res1, col = "red")
2

There are 2 answers

0
Tom Reilly On

There is another package in R called Autobox. It is available from autobox.com(I am affiliated with it).

The standardized plot shows that X is related to Y.Normalized Bivariated Scatterplot

Model with differencing, the x variable and 3 outliers. Note the .257 coefficient is much lower.

Model with differencing

By testing for variance change and using Weighted Least Squares(GLM) we have identified a change in the variance beginning at period 44. See the paper here.

Tsay variance test

ResidualsResiduals

2
Ic3fr0g On

This q-q plot is indicative of a heavy - tailed distribution. You can refer to this question to understand the various types of q-q plots. To answer your question there are packages that do a better job of dealing with non-normal distributions. Try the forecast package -

require('forecast')
ts_mod1 <- auto.arima(y1,xreg = x_data1)
ts_mod1

# Series: y1 
# ARIMA(4,0,2) with non-zero mean 
# 
# Coefficients:
#     ar1      ar2     ar3      ar4      ma1     ma2  intercept  x_data1
# 0.7269  -0.3027  0.2060  -0.0391  -0.6260  0.4672    -2.4920   0.8695
# s.e.  0.4409   0.4004  0.1771   0.1796   0.4577  0.3664     0.2536   0.1102
# 
# sigma^2 estimated as 0.3996:  log likelihood=-99.8
# AIC=217.6   AICc=219.44   BIC=241.74

Here auto.arima automatically selects the best ARIMA(4,0,2) model based on the AIC value which is better than the ARIMA(0,0,1) with AIC = 219.96. The fit is better too as shown by this q-q plot -

Q-Q plot for ARIMA(4,0,2)