Feature Importance using Imbalanced-learn library

Question

Feature Importance using Imbalanced-learn library

3.2k views Asked by mamafoku At 02 January 2025 at 10:06

The imblearn library is a library used for unbalanced classifications. It allows you to use scikit-learn estimators while balancing the classes using a variety of methods, from undersampling to oversampling to ensembles.

My question is however, how can I get feature improtance of the estimator after using BalancedBaggingClassifier or any other sampling method from imblearn?

from collections import Counter
from sklearn.datasets import make_classification
from sklearn.cross_validation import train_test_split
from sklearn.metrics import confusion_matrix
from imblearn.ensemble import BalancedBaggingClassifier 
from sklearn.tree import DecisionTreeClassifier
X, y = make_classification(n_classes=2, class_sep=2,weights=[0.1, 0.9], n_informative=3, n_redundant=1, flip_y=0, n_features=20, n_clusters_per_class=1, n_samples=1000, random_state=10)
print('Original dataset shape {}'.format(Counter(y)))
X_train, X_test, y_train, y_test = train_test_split(X, y,random_state=0)
bbc = BalancedBaggingClassifier(random_state=42,base_estimator=DecisionTreeClassifier(criterion=criteria_,max_features='sqrt',random_state=1),n_estimators=2000)
bbc.fit(X_train,y_train)

Original Q&A

There are 3 answers

mamafoku On 18 September 2017 at 17:01

There is a shortcut around this, however it is not very efficient. The BalancedBaggingClassifieruses the RandomUnderSampler successively and fits the estimator on top. A for-loop with RandomUnderSampler can be one way of going around the pipeline method, and then call the Scikit-learn estimator directly. This will also allow to look at feature_importance:

from imblearn.under_sampling import RandomUnderSampler
rus=RandomUnderSampler(random_state=1)

my_list=[]
for i in range(0,10): #random under sampling 10 times
    X_pl,y_pl=rus.sample(X_train,y_train,)
    my_list.append((X_pl,y_pl)) #forming tuples from samples

X_pl=[]
Y_pl=[]
for num in range(0,len(my_list)): #Creating the dataframes for input/output
    X_pl.append(pd.DataFrame(my_list[num][0]))
    Y_pl.append(pd.DataFrame(my_list[num][1]))

X_pl_=pd.concat(X_pl) #Concatenating the DataFrames
Y_pl_=pd.concat(Y_pl)

RF=RandomForestClassifier(n_estimators=2000,criterion='gini',max_features=25,random_state=1)
RF.fit(X_pl_,Y_pl_) 
RF.feature_importances_

Raymond Reddington On 30 July 2020 at 07:41

According to scikit learn documentation, you can use impurity-based feature importance on classifications, that don't have their own using some sort of ForestClassifier. Here my classifier doesn't have feature_importances_, I'm adding it directly.

classifier.fit(x_train, y_train)

...
...

forest = ExtraTreesClassifier(n_estimators=classifier.n_estimators,
                              random_state=classifier.random_state)

forest.fit(x_train, y_train)
classifier.feature_importances_ = forest.feature_importances_

**Jeremy McGibbon** · Accepted Answer · 2017-09-18T16:54:54+00:00

Not all estimators in sklearn allow you to get feature importances (for example, BaggingClassifier doesn't). If the estimator does, it looks like it should just be stored as estimator.feature_importances_, since the imblearn package subclasses from sklearn classes. I don't know what estimators imblearn has implemented, so I don't know if there are any that provide feature_importances_, but in general you should look at the sklearn documentation for the corresponding object to see if it does.

You can, in this case, look at the feature importances for each of the estimators within the BalancedBaggingClassifier, like this:

for estimator in bbc.estimators_:
    print(estimator.steps[1][1].feature_importances_)

And you can print the mean importance across the estimators like this:

print(np.mean([est.steps[1][1].feature_importances_ for est in bbc.estimators_], axis=0))

TechQA.

Feature Importance using Imbalanced-learn library

There are 3 answers

Related Questions in PYTHON

Related Questions in SCIKIT-LEARN

Related Questions in CLASSIFICATION

Related Questions in RANDOM-FOREST

Related Questions in IMBLEARN

Popular Questions

Popular Tags

Trending Questions