Using categorical variable in spreg models

160 views Asked by At

I would like to employ a spatial regression model, using the spreg package in Python. My data consists of numeric variables, but I also have a categorical land cover variable (with 7 classes) that I need to include in the model. This works perfectly fine using statsmodels, but I haven't been able to figure out how to do this in spreg.

I have tried creating dummy variables manually (using pd.get_dummies(data['land_cover'])), but this results in an error message for my spreg.OLS model:

RuntimeWarning: invalid value encountered in sqrt se_result =np.sqrt(variance)

RuntimeWarning: invalid value encountered in sqrt tStat = betas[list(range(0, len(vm)))].reshape(len(vm),) / np.sqrt(variance)

All the dummy variables also have nan values in the Std.Error, t-Statistic and Probability sections of the results (see excerpt below).

        Variable     Coefficient       Std.Error     t-Statistic     Probability

        CONSTANT    -142.9375000             nan             nan             nan
     temperature       0.0136240       0.0001169     116.4984154       0.0000000
   precipitation       0.0000003       0.0000000     153.7448775       0.0000000
         cover_1     141.9375000             nan             nan             nan
         cover_2     142.0625000             nan             nan             nan
         cover_3     141.6875000             nan             nan             nan
         cover_4     142.0625000             nan             nan             nan
         cover_5     141.9375000             nan             nan             nan
         cover_6     141.6875000             nan             nan             nan
         cover_7     141.8125000             nan             nan             nan

Using statsmodels with the same data/variables, the output of the OLS model was this:

                            coef    std err          t      P>|t|
     temperature         -0.0004   2.72e-05    -15.115      0.000
   precipitation       -1.62e-08   4.12e-10    -39.294      0.000
         cover_1          0.0706      0.001    119.653      0.000
         cover_2          0.0290      0.001     29.431      0.000
         cover_3          0.0100      0.001      7.120      0.000 
         cover_4          0.0491      0.000    122.972      0.000
         cover_5          0.0327      0.000     79.698      0.000 
         cover_6          0.0140      0.000     35.541      0.000 
         cover_7         -0.0026      0.001     -4.223      0.000 

How can I include my categorical data into the spreg models (e.g spreg.GM_Lag)?

1

There are 1 answers

1
Josef On

My guess is that you ran into the "dummy variable trap".

You don't have a constant in the statsmodels version, but it is included in the spreg version.

If you don't drop a reference level in your categorical variable, then it will be perfectly collinear with the constant. The design matrix will be singular and the standard product matrix x'x is not invertible.