I'm building image classifier that uses DBN for feature learning and logistic regression to fine-tune resulting network. Normally, the most convenient way to implement such an architecture in SciKit Learn is to use Pipeline class. But in my case I have ~10K unlabeled images and only ~300 labeled ones. Surely, I want to use all images to train DBN and fit logistic regression with only labeled examples.
I can think of implementing my own Pipeline class that will handle this case, but first I'd like to know if there's already something existing. Is it?
The current scikit-learn Pipeline API is not well suited for supervised learning with unsupervised pre-training. Implementing your own wrapper class is probably the best way to go forward for that case.