Impossible to solve Feature name error while converting an XGBClassifier model to ONNX

41 views Asked by At

I need to load a VotingClassifier model (a mix of XGBoost and NaiveBayes) that is in .sav format. The goal is to convert it to ONNX. However, I don't have access to the dataset, so I cannot retrain the model. I encountered the following error:

RuntimeError: Unable to interpret 'ABC', feature names should follow pattern 'f%d'.

How can I definitively change the feature names to solve this error?

I attempted to change the feature names as follows:

features_name_fixed = ['f0', 'f1', 'f2', 'f3' ,'f4', 'f5', 'f6', 'f7', 'f8' ,'f9' ,'f10', 'f11'] model.feature_names_in_ = features_name_fixed

However, the error persists. When I printed the JSON representation of the model, I noticed that the feature names represented by the field 'split' still use the old names:

 {'nodeid': 0, 'depth': 0, 'split': 'ABC', 'split_condition': 6.25, 'yes': 1, 'no': 2, 'missing': 1, 'gain': 78.1462402, 'cover': 259.75, 'children': [{'nodeid': 1, 'depth': 1, 'split': 'BLK', 'split_condition': 0.550000012, 'yes': 3, 'no': 4, 'missing': 3, 'gain': 21.2281971, 'cover': 171.75, 'children': [{'nodeid': 3, 'depth': 2, 'split': 'ABC', 'split_condition': 4.55000019, 'yes': 7, 'no': 8, 'missing': 7, 'gain': 14.1838226, 'cover': 147.5, 'children': [{'nodeid': 7, 'depth': 3, 'split': 'BLK', 'split_condition': 0.25, 'yes': 11, 'no': 12, 'missing': 11, 'gain': 3.78608131, 'cover': 106.5, 'children': [{'nodeid': 11, 'depth': 4, 'split': 'BLK', 'split_condition': 0.150000006, 'yes': 15, 'no': 16, 'missing': 15, 'gain': 5.14517879, 'cover': 78.5, 'children': [{'nodeid': 15, 'depth': 5, 'split': 'MIN', 'split_condition': 8.64999962, 'yes': 17, 'no': 18, 'missing': 17, 'gain': 4.04689026, 'cover': 60, 'children': [{'nodeid': 17, 'leaf': -0.018082479, 'cover': 23.25}, {'nodeid': 18, 'leaf': -0.00116446253, 'cover': 36.75}]}, {'nodeid': 16, 'leaf': -0.0269777905, 'cover': 18.5}]}, {'nodeid': 12, 'leaf': 0.000966416614, 'cover': 28}]}, {'nodeid': 8, 'depth': 3, 'split': 'MIN', 'split_condition': 16.2000008, 'yes': 13, 'no': 14, 'missing': 13, 'gain': 1.36819792, 'cover': 41, 'children': [{'nodeid': 13, 'leaf': 0.00639292412, 'cover': 19}, {'nodeid': 14, 'leaf': 0.0183946956, 'cover': 22}]}]}, {'nodeid': 4, 'leaf': 0.029015895, 'cover': 24.25}]}, {'nodeid': 2, 'depth': 1, 'split': 'MIN', 'split_condition': 23.9500008, 'yes': 5, 'no': 6, 'missing': 5, 'gain': 8.23132324, 'cover': 88, 'children': [{'nodeid': 5, 'leaf': 0.0257856827, 'cover': 34}, {'nodeid': 6, 'depth': 2, 'split': 'BLK', 'split_condition': 0.350000024, 'yes': 9, 'no': 10, 'missing': 9, 'gain': 2.95942688, 'cover': 54, 'children': [{'nodeid': 9, 'leaf': 0.0367497504, 'cover': 23}, {'nodeid': 10, 'leaf': 0.0526438914, 'cover': 31}]}]}]}

The code is :

from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx import get_latest_tested_opset_version
from onnxmltools.utils import save_model
import joblib
from skl2onnx import convert_sklearn, to_onnx, update_registered_converter
from skl2onnx.common.data_types import FloatTensorType
from skl2onnx.common.shape_calculator import (
        calculate_linear_classifier_output_shapes,
        calculate_linear_regressor_output_shapes,
    )
from onnxmltools.convert.xgboost.operator_converters.XGBoost import convert_xgboost
from onnxmltools.convert import convert_xgboost as convert_xgboost_booster
    
from xgboost import XGBClassifier

model = joblib.load("model.sav")

model = model.set_params(flatten_transform=False)

features_name_fixed = ['f0', 'f1', 'f2', 'f3' ,'f4', 'f5', 'f6', 'f7', 'f8' ,'f9' ,'f10', 'f11']
    # print(model.feature_names_in_)

model.feature_names_in_ = features_name_fixed
    

n_features = 12
    

target_opset = get_latest_tested_opset_version()
    update_registered_converter(
        XGBClassifier,
        "XGBoostXGBClassifier",
        calculate_linear_classifier_output_shapes,
        convert_xgboost,
        options={"nocl": [True, False], "zipmap": [True, False, "columns"]},
    )

onnx_model = convert_sklearn(model,"gbdt_model",

initial_types=[("input", FloatTensorType([None, n_features]))],

target_opset={"": target_opset, "ai.onnx.ml": 1})

save_model(onnx_model, 'model_converted.onnx') 

The original model has been trained with XGBoost version 1.4.2.

0

There are 0 answers