I have been trying to understand how STATA, R, and Python handle SARIMA forecasting differently, and why the same models have produced different predictions. I started trying to match coefficients, and have been unable to get matching results, despite trying many different command parameters. I have tried relatively simple hyperparameters, with and without a constant.
The code I have used for each and the results are listed below. (Note: the data is not stationary.) I have tried with and without constants. STATA finds the MA and season AR terms as not statistically significant, while R and Python finds they are significant. The R and Python results are similar excepting the intercept.
I want to understand what is going on with each process, so I can decide which software to use.
R Code and Results
### R code
ts = c(97.3792037, 95.95151675, 91.31119854, 104.8981482, 105.6913949, 109.9757122, 112.0637876, 107.5345058, 105.7025947, 102.9343384, 99.05647977, 96.41892813, 98.65541846, 96.03831397, 96.16089129, 104.7498141, 104.1124595, 118.547745, 111.2792591, 107.334795, 108.5811032, 104.6114119, 102.2481883, 97.9402986, 97.50978289, 97.18689356, 96.14751328, 104.4959885, 111.2902446, 116.7373247, 111.0557771, 108.2667206, 110.4096783, 104.3825081, 105.0318585, 100.003698, 99.76290231, 100.2887926, 95.43952968, 109.5714367, 112.2008799, 117.2256497, 113.5328799, 113.6856444, 112.138176, 107.5428538, 106.0787703, 100.4712412, 102.7647221, 99.52197604, 93.54484612, 110.8328027, 111.6206782, 115.5910999, 112.3579283, 114.219969, 109.6037091, 107.2897836, 106.0347934, 96.26569125, 99.31431129, 97.28041353, 94.8177262, 108.4643059, 108.629378, 111.9023327, 114.5230675, 113.9775546, 110.0124363, 111.4763949, 110.63024, 105.8363866, 103.6651122, 99.88333821, 99.58756215, 109.5469192, 111.4346437, 113.7493204, 110.9739191, 110.2230651, 110.5110836, 110.3259096, 106.0284998, 106.0575086, 106.3202784, 102.9241831, 103.9779438, 111.4025093, 116.5582041, 124.8018515, 123.7163744, 116.111585, 117.509964, 114.6407459, 112.7685792, 108.6699399, 106.2470851, 107.4548057, 107.4706578, 118.5840337, 119.1763486, 127.357542, 123.0865202, 124.1670008, 122.9045664, 119.0590287, 120.3916487, 114.4012977, 114.4010663, 110.6166835, 115.4359241, 124.4532449, 126.4941134, 134.3627807, 128.3255046, 128.9491699, 126.3461706, 119.4532097, 124.8579134, 120.5122081)
library(forecast)
model = Arima(ts, order = c(1,0,1), seasonal = list(order = c(1,0,0), period = 12), method = "ML")
| ar1 | ma1 | sar1 | mean | |
|---|---|---|---|---|
| Coefficients | 0.9127 | -0.4565 | 0.8696 | 111.8831 |
| se | 0.0438 | 0.1094 | 0.0414 | 7.6850 |
Python Code and Results
### Python code
import numpy as np
import pandas as pd
from statsmodels.tsa.statespace.sarimax import SARIMAX
ts = [97.3792037, 95.95151675, 91.31119854, 104.8981482, 105.6913949, 109.9757122, 112.0637876, 107.5345058, 105.7025947, 102.9343384, 99.05647977, 96.41892813, 98.65541846, 96.03831397, 96.16089129, 104.7498141, 104.1124595, 118.547745, 111.2792591, 107.334795, 108.5811032, 104.6114119, 102.2481883, 97.9402986, 97.50978289, 97.18689356, 96.14751328, 104.4959885, 111.2902446, 116.7373247, 111.0557771, 108.2667206, 110.4096783, 104.3825081, 105.0318585, 100.003698, 99.76290231, 100.2887926, 95.43952968, 109.5714367, 112.2008799, 117.2256497, 113.5328799, 113.6856444, 112.138176, 107.5428538, 106.0787703, 100.4712412, 102.7647221, 99.52197604, 93.54484612, 110.8328027, 111.6206782, 115.5910999, 112.3579283, 114.219969, 109.6037091, 107.2897836, 106.0347934, 96.26569125, 99.31431129, 97.28041353, 94.8177262, 108.4643059, 108.629378, 111.9023327, 114.5230675, 113.9775546, 110.0124363, 111.4763949, 110.63024, 105.8363866, 103.6651122, 99.88333821, 99.58756215, 109.5469192, 111.4346437, 113.7493204, 110.9739191, 110.2230651, 110.5110836, 110.3259096, 106.0284998, 106.0575086, 106.3202784, 102.9241831, 103.9779438, 111.4025093, 116.5582041, 124.8018515, 123.7163744, 116.111585, 117.509964, 114.6407459, 112.7685792, 108.6699399, 106.2470851, 107.4548057, 107.4706578, 118.5840337, 119.1763486, 127.357542, 123.0865202, 124.1670008, 122.9045664, 119.0590287, 120.3916487, 114.4012977, 114.4010663, 110.6166835, 115.4359241, 124.4532449, 126.4941134, 134.3627807, 128.3255046, 128.9491699, 126.3461706, 119.4532097, 124.8579134, 120.5122081]
ts_df = pd.DataFrame({'ts': ts})
model = SARIMAX(ts_df\['ts'\], order=(1, 0, 1), seasonal_order=(1, 0, 0, 12), trend='c', enforce_stationarity=False)
result = model.fit(disp=0, maxiter=100, method="bfgs", cov_type = "oim")
print(result.summary())
| coef | std err | z | P>|z| | [0.025 | 0.975] | |
|---|---|---|---|---|---|---|
| intercept | 1.4956 | 1.032 | 1.449 | 0.147 | -0.528 | 3.519 |
| ar.L1 | 0.8969 | 0.062 | 14.532 | 0.000 | 0.776 | 1.018 |
| ma.L1 | -0.5144 | 0.119 | -4.324 | 0.000 | -0.748 | -0.281 |
| ar.S.L12 | 0.8910 | 0.049 | 18.367 | 0.000 | 0.796 | 0.986 |
| sigma2 | 7.6082 | 1.040 | 7.314 | 0.000 | 5.569 | 9.647 |
STATA Code and Results
* STATA code
clear all
input int time float ts
1 97.3792037
2 95.95151675
3 91.31119854
4 104.8981482
5 105.6913949
6 109.9757122
7 112.0637876
8 107.5345058
9 105.7025947
10 102.9343384
11 99.05647977
12 96.41892813
13 98.65541846
14 96.03831397
15 96.16089129
16 104.7498141
17 104.1124595
18 118.547745
19 111.2792591
20 107.334795
21 108.5811032
22 104.6114119
23 102.2481883
24 97.9402986
25 97.50978289
26 97.18689356
27 96.14751328
28 104.4959885
29 111.2902446
30 116.7373247
31 111.0557771
32 108.2667206
33 110.4096783
34 104.3825081
35 105.0318585
36 100.003698
37 99.76290231
38 100.2887926
39 95.43952968
40 109.5714367
41 112.2008799
42 117.2256497
43 113.5328799
44 113.6856444
45 112.138176
46 107.5428538
47 106.0787703
48 100.4712412
49 102.7647221
50 99.52197604
51 93.54484612
52 110.8328027
53 111.6206782
54 115.5910999
55 112.3579283
56 114.219969
57 109.6037091
58 107.2897836
59 106.0347934
60 96.26569125
61 99.31431129
62 97.28041353
63 94.8177262
64 108.4643059
65 108.629378
66 111.9023327
67 114.5230675
68 113.9775546
69 110.0124363
70 111.4763949
71 110.63024
72 105.8363866
73 103.6651122
74 99.88333821
75 99.58756215
76 109.5469192
77 111.4346437
78 113.7493204
79 110.9739191
80 110.2230651
81 110.5110836
82 110.3259096
83 106.0284998
84 106.0575086
85 106.3202784
86 102.9241831
87 103.9779438
88 111.4025093
89 116.5582041
90 124.8018515
91 123.7163744
92 116.111585
93 117.509964
94 114.6407459
95 112.7685792
96 108.6699399
97 106.2470851
98 107.4548057
99 107.4706578
100 118.5840337
101 119.1763486
102 127.357542
103 123.0865202
104 124.1670008
105 122.9045664
106 119.0590287
107 120.3916487
108 114.4012977
109 114.4010663
110 110.6166835
111 115.4359241
112 124.4532449
113 126.4941134
114 134.3627807
115 128.3255046
116 128.9491699
117 126.3461706
118 119.4532097
119 124.8579134
120 120.5122081
end
tsset time
arima ts, arima(1,0,1) sarima(1,0,0,12) technique(bfgs) diffuse
| ts | Coefficient | std. err. | z | P>|z| | [95% conf. | interval] |
|---|---|---|---|---|---|---|
| _cons | 110.5248 | 2.698564 | 40.96 | 0.000 | 105.2357 | 115.8139 |
| ARMA | ||||||
| ar | L1. | .8340737 | .0504238 | 16.54 | 0.000 | .7352449 | .9329025 |
| ma | L1. | -.0000333 | .0003224 | -0.10 | 0.918 | -.0006651 | .0005986 |
| ARMA12 | ||||||
| ar | L1. | .0000193 | .0001236 | 0.16 | 0.876 | -.0002229 | .0002616 |
| /sigma | 4.828537 | .3242497 | 14.89 | 0.000 | 4.19302 | 5.464055 |