System GMM with R: issues with the multi-part formula for IVs and the sargan test results

760 views Asked by At

I'm working on a panel dataset with N = 30 countries and T = 15 years. I'm using R and the plm package for my analysis. Based on research by Blundell-Bond (1998) and Arellano-Bover (1995), I decided to use the System-GMM onestep model, with only individual effects. However, I'm a little confused about how to use the pgmm function, which requires a multi-part formula to specify the model with IVs and with the Sargan-Hansen test result I get. To be more clear, here are some code examples I tried, the results of estimates and the sargan test. In my model I consider the lagged dependent variable and an exogenous regressor. To avoid making my post too long, I just report the models code and the main results as tidy as I can:

sy_gmm1 <- pgmm(log(GWPcap) ~ lag(log(GWPcap)) + log(GDPcap)|lag(log(GWPcap),2:15) + log(GDPcap), data = europanel, 
                index = c("country","year"), model = "onestep", effect = "individual", transformation = "ld")
        
sy_gmm2 <- pgmm(log(GWPcap) ~ lag(log(GWPcap)) + log(GDPcap)|lag(log(GWPcap),2:15) | log(GDPcap), data = europanel, 
                index = c("country","year"), model = "onestep", effect = "individual", transformation = "ld")
        
sy_gmm3 <- pgmm(log(GWPcap) ~ lag(log(GWPcap)) + log(GDPcap) | lag(log(GWPcap),2) + log(GDPcap), data = europanel, 
                index = c("country","year"), model = "onestep", effect = "individual", transformation = "ld")
        
sy_gmm4 <- pgmm(log(GWPcap) ~ lag(log(GWPcap)) + log(GDPcap)|lag(log(GWPcap),2) | log(GDPcap), data = europanel, 
                index = c("country","year"), model = "onestep", effect = "individual", transformation = "ld")
        
sy_gmm5 <- pgmm(log(GWPcap) ~ lag(log(GWPcap)) + log(GDPcap) | lag(log(GWPcap),2) + lag(log(GDPcap),1:2), data = europanel, 
                index = c("country","year"), model = "onestep", effect = "individual", transformation = "ld")
        
sy_gmm6 <- pgmm(log(GWPcap) ~ lag(log(GWPcap)) + log(GDPcap)|lag(log(GWPcap),2) | lag(log(GDPcap),1:2), data = europanel, 
                index = c("country","year"), model = "onestep", effect = "individual", transformation = "ld")

# Coefficients and p-values of estimates
  
                   Estimate       p.value Model
lag(log(GWPcap)) 0.90340911  1.525370e-86     1
log(GDPcap)      0.06214275  1.965823e-02     1
lag(log(GWPcap)) 0.97250426  0.000000e+00     2
log(GDPcap)      0.02222075  1.383774e-01     2
lag(log(GWPcap)) 0.81905400  2.615214e-47     3
log(GDPcap)      0.10822697  8.291284e-04     3
lag(log(GWPcap)) 0.82343976  4.873484e-16     4
log(GDPcap)      0.11164469  7.294118e-02     4
lag(log(GWPcap)) 0.84762245  2.754039e-87     5
log(GDPcap)      0.09281759  1.636567e-04     5
lag(log(GWPcap)) 0.86280993 3.843798e-104     6
log(GDPcap)      0.08809634  3.325890e-04     6

# Sargan test
                stat  df   p.value
    sargan1 30.00000 128 1.0000000
    sargan2 30.00000 104 1.0000000
    sargan3 30.00000  50 0.9888352
    sargan4 29.64127  26 0.2827384
    sargan5 30.00000  63 0.9998660
    sargan6 30.00000  28 0.3632178

As you can see in models 2, 4 and 6 I put the exogenous regressor log(GDPcap) in the third part of the formula, separating it from the lagged dependent instruments . I don't know if this is the right way to set the formula, as in the R documentation it is specified that it is needed for "normal instruments". What does it mean? Given this doubt I wanted to do an experiment in model 6, using lag(log(GDPcap)) and the results I got in the estimates, log(GDPcap) is significant, and in the Sargan test they seem to be apparently good.

Furthermore, I noticed the different results I got with the Sargan test, in particular regarding the degrees of freedom, which are very related to the number of instruments I'm using, and the p-value. From what I have read around, using too many instruments can be a double-edged sword, especially given the size of my sample, and the Sargan-Hansen test could suffer, giving a too high p-value. So my question is which of these six models was written right and how should I interpret the results I got, both in the estimates (in some models the exogenous regressor is not significant) and in the test?

I hope I was clear and that someone can resolve my doubts. Thanks in advance.

0

There are 0 answers