The imblearn library is a library used for unbalanced classifications. It allows you to use scikit-learn
estimators while balancing the classes using a variety of methods, from undersampling to oversampling to ensembles.
My question is however, how can I get feature improtance of the estimator after using BalancedBaggingClassifier
or any other sampling method from imblearn?
from collections import Counter
from sklearn.datasets import make_classification
from sklearn.cross_validation import train_test_split
from sklearn.metrics import confusion_matrix
from imblearn.ensemble import BalancedBaggingClassifier
from sklearn.tree import DecisionTreeClassifier
X, y = make_classification(n_classes=2, class_sep=2,weights=[0.1, 0.9], n_informative=3, n_redundant=1, flip_y=0, n_features=20, n_clusters_per_class=1, n_samples=1000, random_state=10)
print('Original dataset shape {}'.format(Counter(y)))
X_train, X_test, y_train, y_test = train_test_split(X, y,random_state=0)
bbc = BalancedBaggingClassifier(random_state=42,base_estimator=DecisionTreeClassifier(criterion=criteria_,max_features='sqrt',random_state=1),n_estimators=2000)
bbc.fit(X_train,y_train)
Not all estimators in
sklearn
allow you to get feature importances (for example,BaggingClassifier
doesn't). If the estimator does, it looks like it should just be stored asestimator.feature_importances_
, since theimblearn
package subclasses fromsklearn
classes. I don't know what estimatorsimblearn
has implemented, so I don't know if there are any that providefeature_importances_
, but in general you should look at thesklearn
documentation for the corresponding object to see if it does.You can, in this case, look at the feature importances for each of the estimators within the
BalancedBaggingClassifier
, like this:And you can print the mean importance across the estimators like this: