How to set important features as attribute on XGBRegressor and save as part of json while saving the model

Question

How to set important features as attribute on XGBRegressor and save as part of json while saving the model

57 views Asked by soumeng78 At 13 January 2024 at 01:47

I have trained a XGBRegressor model. Now, I am trying to save the important features as attribute on the model and want that the attribute gets saved/restored along with the model.

I have 2 issues here -

1.

regressor.fit(X=X_train, y=y_train, eval_set=[(X_train, y_train), (X_validation, y_validation)], verbose=False)

feature_importance: List[Tuple[str, float]] = sorted(
            regressor.get_booster().get_score(importance_type="gain").items(), key=lambda x: x[1]
        )
selected_features: List[str] = [x[0] for x in feature_importance if x[1] > 0]
setattr(regressor, "selected_features", selected_features)

The setattr and corresponding getattr is giving me lint warnings (B010 and B009) - is there better way to do this to avoid those warnings?

The getattr usage is something like this -

def get_model_features(model: XGBRegressor) -> List[str] | None:
   return getattr(model, "selected_features") if (model is not None and isinstance(model, XGBRegressor) else None

The attribute does not get saved in the json file. I am using following call to save -

regressor.save_model(fname="model.json")

How to accomplish this? I want to avoid pickle save/restore.

Original Q&A

There are 2 answers

soumeng78 On 24 January 2024 at 18:03

Based on inputs from @user1808924 and input from a fellow colleague, I finally did something similar to following:

regressor.fit(X=X_train, y=y_train, eval_set=[(X_train, y_train), (X_validation, y_validation)], verbose=False)

feature_importance: List[Tuple[str, float]] = sorted(
            regressor.get_booster().get_score(importance_type="gain").items(), key=lambda x: x[1]
        )
selected_features: List[str] = [x[0] for x in feature_importance if x[1] > 0]

model_data: Dict[str, Any] = {
    "model": base64.b64encode(pickle.dumps(regressor)).decode('utf-8'),
    "features": selected_features,
}


with open('XGBModel.json', 'w') as json_file: 
    json.dump(model_data, json_file)

**user1808924** · Accepted Answer · 2024-01-13T08:33:03+00:00

The attribute does not get saved in the json file

This is the expected behaviour.

The XGBRegressor.save_model(fname) method call simply "redirects" to the Booster.save_model(fname) method call. Any attributes that were defined in the top-most Scikit-Learn layer (such as custom feature importance attributes) will not be propagated along.

The underlying XGBoost model saver/loader (via JSON/UBJSON) does not contain any logic for maintaining custom model metadata. Ony real model data, which is actually used by XGBoost itself.

If you want to save Scikit-Learn wrappers with custom attributes, then you must keep using the pickle data format. No way around there.

TechQA.

How to set important features as attribute on XGBRegressor and save as part of json while saving the model

There are 2 answers

Related Questions in PYTHON

Related Questions in PYTHON-3.X

Related Questions in XGBOOST

Related Questions in XGBREGRESSOR

Popular Questions

Trending Questions