I have trained a SequentialFeatureSelector from sklearn and am now interested in the best model (based on the given scoring method) it produced. Is there a possible way of extracting the parameters and using them generate the model that was used?
I have seen that there exists a get_params() function for the SequentialFeatureSelector, but I don't undestand how to interpret the output and retrieve the best estimator.
The main result of this model is which features it decided to select. You can access that information in various ways. Suppose you have fitted a
selector=SequentialFeatureSelector(...).fit(...).selector.support_is a boolean vector, whereTruemeans it selected that feature. If you started off with 5 features, and told it to select 2, then the vector will be[True, False, False, False, True]if it selected the first and last feature.You can get the same output as above using
selector.get_support(). If you want the indices rather than a boolean vector, you can useselector.get_support(indices=True)- it'll return[0, 4]in this case indicating feature number 0 and feature number 3.To get the feature names (only applies if you fed the model a dataframe):
selector.feature_names_in_[selector.support_]After fitting the selector, if you want it to strip out the unselected features, you can use
selector.transform(X_test). The.transform(X_test)will apply the already-fitted selector to the supplied data. In this example, ifX_testis 100 x 5, then it'll return a 100 x 2 version where it has only kept the features determined from the initial.fit().SequentialFeatureSelectordoesn't keep any of the models fitted during cross-validation. So I think you'd need to fit a new model using the selected features:Alternatively, this ensures consistency with the original estimator by saving you from needing to manually specify all the original parameters:
If you want identical models (down to the random seed) it'll also be necessary to set up the CV appropriately.