How to convert custom pipeline (categorical get_dummies) with convert_coreml?

763 views Asked by At

I'm trying to save a custom sklearn pipeline as onnx model, but I'm getting errors in the process.

sample code:

from sklearn.preprocessing import OneHotEncoder
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline

from sklearn import svm
from winmltools import convert_coreml

import copy
from IPython.display import display
# https://github.com/pandas-dev/pandas/issues/8918

class MyEncoder(TransformerMixin):

    def __init__(self, columns=None):
        self.columns = columns

    def transform(self, X, y=None, **kwargs):
        return pd.get_dummies(X, dtype=np.float, columns=['ID'])

    def fit(self, X, y=None, **kwargs):
        return self

# data
X = pd.DataFrame([[100, 1.1, 3.1], [200, 4.1, 5.1], [100, 4.1, 2.1]], columns=['ID', 'X1', 'X2'])
Y = pd.Series([3, 2, 4])

# check transform
df = MyEncoder().transform(X)
display(df)

# create pipeline
pipe = Pipeline( steps=[('categorical', MyEncoder()), ('classifier', svm.SVR())] )
print(type(pipe), MyEncoder().transform(X).dtypes, '\n')

# prepare models
svm_toy  = svm.SVR()
svm_toy.fit(X,Y)
pipe_toy = copy.deepcopy(pipe).fit(X, Y)

# save onnx

# no problem here
initial_type = [('X', FloatTensorType( [None, X.shape[1]] ) ) ] 
onx = convert_sklearn(svm_toy, initial_types=initial_type  )

# something goes wrong...
initial_type = [('X', FloatTensorType( [None, X.shape[1]] ) ) ] 
onx = convert_sklearn(pipe_toy, initial_types=initial_type  )

The simple conversion goes well:

# no problem here
initial_type = [('X', FloatTensorType( [None, X.shape[1]] ) ) ] 
onx = convert_sklearn(svm_toy, initial_types=initial_type  )

But the pipeline conversion fails:

# something goes wrong...
initial_type = [('X', FloatTensorType( [None, X.shape[1]] ) ) ] 
onx = convert_sklearn(pipe_toy, initial_types=initial_type  )

with the following error:

MissingShapeCalculator: Unable to find a shape calculator for type ''.
It usually means the pipeline being converted contains a
transformer or a predictor with no corresponding converter
implemented in sklearn-onnx. If the converted is implemented
in another library, you need to register
the converted so that it can be used by sklearn-onnx (function
update_registered_converter). If the model is not yet covered
by sklearn-onnx, you may raise an issue to
https://github.com/onnx/sklearn-onnx/issues
to get the converter implemented or even contribute to the
project. If the model is a custom model, a new converter must
be implemented. Examples can be found in the gallery.

Am I missing something with the customized pipeline and the get_dummies?

1

There are 1 answers

0
Stats On

Custom transformers, i.e. the ones not supported by sklearn need extra information to be recognized by ONNX. You need to write shape and converter functions for your transformer and then register your transformer with these two additional functions. See more in the documentation.