Python logistic regression in statsmodels using l1 penalty with class weights

65 views Asked by At

I would like to run logistic regression in statsmodels using an l1 penalty (lasso) and class weights due to a class imbalance. There are several posts that explain how to either implement logistic regression with an l1 penalty (ex: here) or how to implement logistic regression with class weights (ex: [here] (How to use weights in a logistic regression)), but I can't figure out how to do both together.

Here is what I've done so far:

# imports
import numpy as np
import statsmodels.api as sm
from sklearn.model_selection import train_test_split

# generate train and test datasets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 6, 
shuffle=True, stratify=y)

# build an l1 penalized logit model
logit_model_l1 = sm.Logit(y_train, sm.add_constant(X_train))
result_l1 = logit_model_l1.fit_regularized(method='l1')

Results:

                           Logit Regression Results                           
==============================================================================
Dep. Variable:                      y   No. Observations:                 1177
Model:                          Logit   Df Residuals:                     1142
Method:                           MLE   Df Model:                           34
Date:                Wed, 31 Jan 2024   Pseudo R-squ.:                  0.7835
Time:                        10:01:40   Log-Likelihood:                -45.466
converged:                       True   LL-Null:                       -209.96
Covariance Type:            nonrobust   LLR p-value:                 5.523e-50
# build a class-weighted logit model
logit_model_weighted = sm.GLM(y_train, sm.add_constant(X_train), family = sm.families.Binomial(), freq_weights = np.asarray(y_train))
result_weighted = logit_model_weighted.fit()

# note that if I change ".fit()" to ".fit_regularized(method='l1')" in line above, I get an error, as the l1 method is not an accepted parameter.

Results:

                 Generalized Linear Model Regression Results                  
==============================================================================
Dep. Variable:                      y   No. Observations:                 1177
Model:                            GLM   Df Residuals:                     1091
Model Family:                Binomial   Df Model:                           34
Link Function:                  Logit   Scale:                          1.0000
Method:                          IRLS   Log-Likelihood:            -1.2001e-09
Date:                Wed, 14 Feb 2024   Deviance:                   2.4032e-09
Time:                        11:30:47   Pearson chi2:                 1.20e-09
No. Iterations:                    26   Pseudo R-squ. (CS):         -2.039e-12
Covariance Type:            nonrobust    

Does anybody how to build a model that incorporates l1 penalization and class weights in statsmodels?

Note that I have already accomplished this in , but I need the additional statistics that are available via statsmodels.

0

There are 0 answers