I'm working on a panel dataset with N = 30 countries and T = 15 years. I'm using R
and the plm
package for my analysis.
Based on research by Blundell-Bond (1998) and Arellano-Bover (1995), I decided to use the System-GMM onestep model, with only individual effects.
However, I'm a little confused about how to use the pgmm
function, which requires a multi-part formula to specify the model with IVs and with the Sargan-Hansen test result I get.
To be more clear, here are some code examples I tried, the results of estimates and the sargan
test. In my model I consider the lagged dependent variable and an exogenous regressor. To avoid making my post too long, I just report the models code and the main results as tidy as I can:
sy_gmm1 <- pgmm(log(GWPcap) ~ lag(log(GWPcap)) + log(GDPcap)|lag(log(GWPcap),2:15) + log(GDPcap), data = europanel,
index = c("country","year"), model = "onestep", effect = "individual", transformation = "ld")
sy_gmm2 <- pgmm(log(GWPcap) ~ lag(log(GWPcap)) + log(GDPcap)|lag(log(GWPcap),2:15) | log(GDPcap), data = europanel,
index = c("country","year"), model = "onestep", effect = "individual", transformation = "ld")
sy_gmm3 <- pgmm(log(GWPcap) ~ lag(log(GWPcap)) + log(GDPcap) | lag(log(GWPcap),2) + log(GDPcap), data = europanel,
index = c("country","year"), model = "onestep", effect = "individual", transformation = "ld")
sy_gmm4 <- pgmm(log(GWPcap) ~ lag(log(GWPcap)) + log(GDPcap)|lag(log(GWPcap),2) | log(GDPcap), data = europanel,
index = c("country","year"), model = "onestep", effect = "individual", transformation = "ld")
sy_gmm5 <- pgmm(log(GWPcap) ~ lag(log(GWPcap)) + log(GDPcap) | lag(log(GWPcap),2) + lag(log(GDPcap),1:2), data = europanel,
index = c("country","year"), model = "onestep", effect = "individual", transformation = "ld")
sy_gmm6 <- pgmm(log(GWPcap) ~ lag(log(GWPcap)) + log(GDPcap)|lag(log(GWPcap),2) | lag(log(GDPcap),1:2), data = europanel,
index = c("country","year"), model = "onestep", effect = "individual", transformation = "ld")
# Coefficients and p-values of estimates
Estimate p.value Model
lag(log(GWPcap)) 0.90340911 1.525370e-86 1
log(GDPcap) 0.06214275 1.965823e-02 1
lag(log(GWPcap)) 0.97250426 0.000000e+00 2
log(GDPcap) 0.02222075 1.383774e-01 2
lag(log(GWPcap)) 0.81905400 2.615214e-47 3
log(GDPcap) 0.10822697 8.291284e-04 3
lag(log(GWPcap)) 0.82343976 4.873484e-16 4
log(GDPcap) 0.11164469 7.294118e-02 4
lag(log(GWPcap)) 0.84762245 2.754039e-87 5
log(GDPcap) 0.09281759 1.636567e-04 5
lag(log(GWPcap)) 0.86280993 3.843798e-104 6
log(GDPcap) 0.08809634 3.325890e-04 6
# Sargan test
stat df p.value
sargan1 30.00000 128 1.0000000
sargan2 30.00000 104 1.0000000
sargan3 30.00000 50 0.9888352
sargan4 29.64127 26 0.2827384
sargan5 30.00000 63 0.9998660
sargan6 30.00000 28 0.3632178
As you can see in models 2, 4 and 6 I put the exogenous regressor log(GDPcap)
in the third part of the formula, separating it from the lagged dependent instruments . I don't know if this is the right way to set the formula, as in the R documentation it is specified that it is needed for "normal instruments". What does it mean?
Given this doubt I wanted to do an experiment in model 6, using lag(log(GDPcap))
and the results I got in the estimates, log(GDPcap)
is significant, and in the Sargan test they seem to be apparently good.
Furthermore, I noticed the different results I got with the Sargan test, in particular regarding the degrees of freedom, which are very related to the number of instruments I'm using, and the p-value. From what I have read around, using too many instruments can be a double-edged sword, especially given the size of my sample, and the Sargan-Hansen test could suffer, giving a too high p-value. So my question is which of these six models was written right and how should I interpret the results I got, both in the estimates (in some models the exogenous regressor is not significant) and in the test?
I hope I was clear and that someone can resolve my doubts. Thanks in advance.