I try to use a Pipeline to build my model like this: I want to predict multiple outputs with a random forst classifier. Since a pipeline only allows only the last step to be the classifier, I nested my pipeline. This works fine without GridSearch.
pipeline = Pipeline([
('vect', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', MultiOutputClassifier(RandomForestClassifier(), n_jobs=-1)),
])
Now I try to pass multiple params to my RF classifier, but since it is nested, it will be passed to the MultiOutputClassifier
, at least that´s what I think happens.
param_grid = {
'clf__n_estimators': [200, 500],
'clf__max_features': ['auto', 'sqrt', 'log2'],
'clf__max_depth' : [4,5,6,7,8],
'clf__criterion' :['gini', 'entropy']
}
cv = GridSearchCV(pipeline, param_grid=param_grid)
This results in an error: ValueError: Invalid parameter criterion for estimator
Is there a way to pass the params to my RandomForestClassifier
or is there a way to pipe multiple classifiers?
Try this:
In general, you can access tunable params with: