PyCaret Classification - "finalize_model" gives very different results than "compare_models"

190 views Asked by At

Toolset Versions:

python: 3.10.13 | packaged by Anaconda, Inc. | (main, Sep 11 2023, 13:24:38) [MSC v.1916 64 bit (AMD64)]
pycaret: 3.1.0


First post - please be gentle if I don't provide all the info needed. ;)

I'm using pycaret to perform a binary classification. Here is my setup:

exp = ClassificationExperiment()
exp.setup(filtered_train, target = 'diseaseweek', fix_imbalance=True, log_experiment = True,
          experiment_name = 'exp1 - full feature set, PCA',
          normalize=True, remove_multicollinearity=True, pca=True, pca_components=0.95, session_id = 123)

PyCaret Setup

And here are the results of "compare_models." I've left the holdout split as the PyCaret default.

compare_models() results

This is the resulting ROC curve:

Initial ROC curve

I finalize the model: Model finalization

And get this resulting ROC curve: ROC curve after finalization

Any guidance on the levers I can pull to reduce the overfitting I'm seeing? Thanks!

I ran a PyCaret classification model using the default 10-fold stratified cross validation, and the default 70:30 holdout split and saw good cross validation performance. But when I finalized the model across the entire dataset, the performance of the final model is greatly reduced. I'm not sure what actions I can take to reduce the overfitting.

I should add that I get very similar results with a tuned version of the same model. (Tuned on 50 iterations.)


There are 0 answers