In sklearn, does a fitted pipeline reapply every transform?

456 views Asked by At

Apologies if this is obvious but I couldn't find a clear answer to this:

Say I've used a pretty typical pipeline:

feat_sel = RandomizedLogisticRegression()
clf = RandomForestClassifier()
pl = Pipeline([ ('preprocessing', preprocessing.StandardScaler()),
            ('feature_selection', feat_sel),
            ('classification', clf)])
pl.fit(X,y)

Now when I apply pl on a new set,

pl.predict(X_classify);

is RandomizedLogisticRegression going to be reapplied or are the columns that were selected in training going to be used in the new data? If not is there a way for pipeline to differentiate between feature selectors and feature extractors/scalers/other transforms that should be applied on the new input? Until I'm sure, I'm skipping the pipeline feature and just doing each step manually and maintaning state.

Thanks!

1

There are 1 answers

2
Andreas Mueller On BEST ANSWER

The pipeline calls transform on the preprocessing and feature selection steps if you call pl.predict. That means that the features selected in training will be selected from the test data (the only thing that makes sense here).

It is unclear what you mean by "apply" here. Nothing new will be learned when calling "predict", but all steps will be used with "transform".