I am trying to model a time series data using ARIMA modelling in python. I used the function statsmodels.tsa.stattools.arma_order_select_ic
on the default data series and got the values of p and q as 2,2 respectively. The code is as below,
dates=pd.date_range('2010-11-1','2011-01-30')
dataseries=Series([22,624,634,774,726,752,38,534,722,678,750,690,686,26,708,606,632,632,632,584,28,576,474,536,512,464,436,24,448,408,528,
602,638,640,26,658,548,620,534,422,482,26,616,612,622,598,614,614,24,644,506,522,622,526,26,22,738,582,592,408,466,568,
44,680,652,598,642,714,562,38,778,796,742,460,610,42,38,732,650,670,618,574,42,22,610,456,22,630,408,390,24],index=dates)
df=pd.DataFrame({'Consumption':dataseries})
df
sm.tsa.arma_order_select_ic(df, max_ar=4, max_ma=2, ic='aic')
The Result is as follow,
{'aic': 0 1 2
0 1262.244974 1264.052640 1264.601342
1 1264.098325 1261.705513 1265.604662
2 1264.743786 1265.015529 1246.347400
3 1265.427440 1266.378709 1266.430373
4 1266.358895 1267.674168 NaN, 'aic_min_order': (2, 2)}
But when I use Augumented Dickey Fuller test, the test result shows that the series is not stationary.
d_order0=sm.tsa.adfuller(dataseries)
print 'adf: ', d_order0[0]
print 'p-value: ', d_order0[1]
print'Critical values: ', d_order0[4]
if d_order0[0]> d_order0[4]['5%']:
print 'Time Series is nonstationary'
print d
else:
print 'Time Series is stationary'
print d
Output is as follow,
adf: -1.96448506629
p-value: 0.302358888762
Critical values: {'5%': -2.8970475206326833, '1%': -3.5117123057187376, '10%': -2.5857126912469153}
Time Series is nonstationary
1
When I cross verified the results with R, it showed that the default series is stationary. Then why did the augumented dickey fuller test result in non stationary series?
Clearly you have some seasonality in your data. Then arma models and stationarity tests need to be carefully done.
Apparently, the reason for the difference in adf test between python and R is the number of default lags each software uses.