The imblearn library is a library used for unbalanced classifications. It allows you to use scikit-learn estimators while balancing the classes using a variety of methods, from undersampling to oversampling to ensembles.
My question is however, how can I get feature improtance of the estimator after using BalancedBaggingClassifier or any other sampling method from imblearn?
from collections import Counter
from sklearn.datasets import make_classification
from sklearn.cross_validation import train_test_split
from sklearn.metrics import confusion_matrix
from imblearn.ensemble import BalancedBaggingClassifier
from sklearn.tree import DecisionTreeClassifier
X, y = make_classification(n_classes=2, class_sep=2,weights=[0.1, 0.9], n_informative=3, n_redundant=1, flip_y=0, n_features=20, n_clusters_per_class=1, n_samples=1000, random_state=10)
print('Original dataset shape {}'.format(Counter(y)))
X_train, X_test, y_train, y_test = train_test_split(X, y,random_state=0)
bbc = BalancedBaggingClassifier(random_state=42,base_estimator=DecisionTreeClassifier(criterion=criteria_,max_features='sqrt',random_state=1),n_estimators=2000)
bbc.fit(X_train,y_train)
Not all estimators in
sklearnallow you to get feature importances (for example,BaggingClassifierdoesn't). If the estimator does, it looks like it should just be stored asestimator.feature_importances_, since theimblearnpackage subclasses fromsklearnclasses. I don't know what estimatorsimblearnhas implemented, so I don't know if there are any that providefeature_importances_, but in general you should look at thesklearndocumentation for the corresponding object to see if it does.You can, in this case, look at the feature importances for each of the estimators within the
BalancedBaggingClassifier, like this:And you can print the mean importance across the estimators like this: