How do you calculate p-values for a scikit-learn model with a Tweedie Regressor?

282 views Asked by At

I am using sci-kit learn to estimate the p-values of my GLM with a Tweedie link function.

First, I estimate the p-values with statsmodels to see the values I should be matching to. Here is the result from statsmodels:

Statsmodel Tweedie GLM Coefficients and P-Values

Then, I create the same model in sci-kit learn and try to estimate p-values:

Sci-kit Learn Model Coefficients

Manual Calculation of P-Values

These p-values are quite far off of statsmodels. I would expect some difference since the coefficients don't match perfectly (but generally are pretty close), but this difference between p-values is quite large.

I believe the error is in how my variance-covariance matrix is being calculated (vcov in the screenshot above). Do you know what the variance-covariance matrix estimation should look like for a weighted GLM?

I'm trying to estimate p-values for a TweedieRegressor in sklearn. I expected my estimated p-values to match the p-values from a statsmodel GLM with a Tweedie link function.

1

There are 1 answers

0
Simone On

Without much detail on the data, the code or what the aim of the analysis is, it is hard to tell. I looked at your screen shots and try to answer.

First I summarise my interpretation of the screen shots:

The first screen shot seems to use the Statsmodel package from Python.

The second screen shot shows the results from the TweedieRegressor from the sklearn linear model package The third screen shot seems to use the normal distribution to compute Z-scores drawing on the stats package from scipy.

I am not an expert on Statsmodel and sklearn's TweeedieRegressor. But in my experience, it is not unusual for different software packages to arrive at different results because there are different computation methods and parameters.

So far, I have not found no easy way how to estimate the coefficients and p-values with the TweedieRegressor. I would recommend you to use R. It is so much easier and there is a lot of useful tutorials available. I googled "R GLM" - here just the first two links Statmethods and ETH help pages on GLM

As GLM relax the assumptions of normal distribution I am not sure how the computation of z-scores with normal distribution would help you. The TweedieRegressor is for count data that is not normally distributed.

If you question is more of a statistical nature, I recommend you post it on crossvalidated.