I have a two-stage meta-estimator that is initialized with two pipelines. The estimator is meant to classify observations into 1, -1, or 0. The first pipeline learns to distinguish 0 from (1, -1), and the second learns to distinguish 1 from -1, removing all the 0s. Here is the code for the meta-estimator:
class TwoStageEstimator(BaseEstimator, ClassifierMixin):
def __init__(self, pipeline_1, pipeline_2):
self.pipeline_1 = pipeline_1
self.pipeline_2 = pipeline_2
def fit(self, X, y):
# First-stage training
self.pipeline_1 = clone(self.pipeline_1)
y_train_1 = abs(y)
self.pipeline = self.pipeline_1.fit(X, y_train_1)
# Second-stage training
self.pipeline_2 = clone(self.pipeline_2)
y_train_2 = y[y != 0]
X_train_2 = X.loc[y_train != 0, ]
self.pipeline = self.pipeline_2.fit(X_train_2, y_train_2)
# Set fit status
self.is_fit_ = True
return self
def predict(self, X):
# Check is fit had been called
check_is_fitted(self)
y = self.pipeline_1.predict(X) * self.pipeline_2.predict(X)
return y
This all works if I call the estimator like
tsm = TwoStageEstimator(pipeline, pipeline)
prd_stance = tsm.fit(X_train, y_train).predict(X_test)
But when I try to use CV, it breaks.
scores = cross_val_score(
tsm, X, y, scoring = 'accuracy', cv = ms.StratifiedKFold(n_splits=7, shuffle=True)
)
scores
The error messages seem to suggest the issue is in a conflict between the indexing in the fit and the indexing done in CV.
raise IndexingError(
pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
...
raise NotImplementedError(
NotImplementedError: iLocation based boolean indexing on an integer type is not available
Can anyone point me to a solution here?