Combining labeled and unlabeled data in a single pipeline

127 views Asked by At

I'm building image classifier that uses DBN for feature learning and logistic regression to fine-tune resulting network. Normally, the most convenient way to implement such an architecture in SciKit Learn is to use Pipeline class. But in my case I have ~10K unlabeled images and only ~300 labeled ones. Surely, I want to use all images to train DBN and fit logistic regression with only labeled examples.

I can think of implementing my own Pipeline class that will handle this case, but first I'd like to know if there's already something existing. Is it?

1

There are 1 answers

0
ogrisel On BEST ANSWER

The current scikit-learn Pipeline API is not well suited for supervised learning with unsupervised pre-training. Implementing your own wrapper class is probably the best way to go forward for that case.