In sklearn, does a fitted pipeline reapply every transform?

Question

In sklearn, does a fitted pipeline reapply every transform?

493 views Asked by konel At 22 June 2015 at 02:49

Apologies if this is obvious but I couldn't find a clear answer to this:

Say I've used a pretty typical pipeline:

feat_sel = RandomizedLogisticRegression()
clf = RandomForestClassifier()
pl = Pipeline([ ('preprocessing', preprocessing.StandardScaler()),
            ('feature_selection', feat_sel),
            ('classification', clf)])
pl.fit(X,y)

Now when I apply pl on a new set,

pl.predict(X_classify);

is RandomizedLogisticRegression going to be reapplied or are the columns that were selected in training going to be used in the new data? If not is there a way for pipeline to differentiate between feature selectors and feature extractors/scalers/other transforms that should be applied on the new input? Until I'm sure, I'm skipping the pipeline feature and just doing each step manually and maintaning state.

Thanks!

Original Q&A

There are 1 answers

**Andreas Mueller** · Accepted Answer · 2015-06-22T14:30:28+00:00

The pipeline calls transform on the preprocessing and feature selection steps if you call pl.predict. That means that the features selected in training will be selected from the test data (the only thing that makes sense here).

It is unclear what you mean by "apply" here. Nothing new will be learned when calling "predict", but all steps will be used with "transform".

TechQA.

In sklearn, does a fitted pipeline reapply every transform?

There are 1 answers

Related Questions in PYTHON

Related Questions in SCIKIT-LEARN

Related Questions in PIPELINE

Related Questions in FEATURE-SELECTION

Popular Questions

Popular Tags

Trending Questions