Exactly same values for accuracy in RFECV

Question

Exactly same values for accuracy in RFECV

305 views Asked by TobiSonne At 07 May 2021 at 11:35

I'm trying to fit a logistic regression with RFECV. That's my code:

log_reg = LogisticRegression(solver = "lbfgs", 
                             max_iter = 1000)
random.seed(4711)
rfecv = RFECV(estimator = log_reg,
              scoring = "accuracy", 
              cv = 10)

Model = rfecv.fit(X_train, y_train)

I don't think there is anything wrong with my data or my code, but the accuracy is exactly the same for almost every different value of feature size:

Model.grid_scores_
array([0.76200776, 0.76200776, 0.76200776, 0.76200776, 0.76200776,
       0.76200776, 0.76200776, 0.76200776, 0.76200776, 0.76200776,
       0.76200776, 0.76200776, 0.76200776, 0.76200776, 0.76200776,
       0.76200776, 0.76200776, 0.76200776, 0.76200776, 0.76556425,
       0.80968999, 0.80962074])

How can this happen? My data is quite big (more than 20000 observations). I cannot imagine that in every fold of the cross validation the same cases are estimated correctly. But if so how could this happen? 1 variable can explain as much as 19 can but not as much as 20 could? Then why don't take the first and the 20th? I'm really confused.

Original Q&A

There are 1 answers

**Arturo Sbr** · Answer 1 · 2021-05-07T13:25:49+00:00

I believe all your accuracies are the same because LogisticRegression uses L2 regularization by default. That is, penalty='l2' unless you pass it something else.

This means that even when Model is using all 22 features, the underlying algorithm log_reg is penalizing the beta coefficients using L2 regularization. So if you prune the least important features, it won't affect the accuracy because the underlying logit model with 22 features has pushed the coefficients of the least important features close to zero.

I suggest you try:

# Model with no penalty
log_reg = LogisticRegression(solver='lbfgs', 
                             max_iter=1000,
                             penalty='none')

# Set seed
random.seed(4711)

# Initialize same search as before
rfecv = RFECV(estimator=log_reg,
              scoring='accuracy', 
              cv=10)

# Fit search
rfecv.fit(X_train, y_train)

# Tell us how it went
rfecv.grid_scores_

TechQA.

Exactly same values for accuracy in RFECV

There are 1 answers

Related Questions in PYTHON

Related Questions in SCIKIT-LEARN

Related Questions in LOGISTIC-REGRESSION

Related Questions in CROSS-VALIDATION

Related Questions in RFE

Popular Questions

Popular Tags

Trending Questions