Why the using optimized parameters (MSE is the minimize objective) in the XGBRegressor gives me different RMSE than the optimized RMSE?

Question

Why the using optimized parameters (MSE is the minimize objective) in the XGBRegressor gives me different RMSE than the optimized RMSE?

173 views Asked by Hang Nguyen At 19 July 2019 at 01:55

I am dealing with a regressor problem. I am at the step to tune the parameters of XGBRegressor model so I use the library GPyOpt to get the optimized parameters. The functions returns an array of 5 elements and the minimized MSE which is 1813. Then I try to input the optimized parameters in the model then the MSE for the model returns 2810. I wonder why that happened?

I am really familiar with the library GPyOpt. There isnt a lot of information about this problem I am facing so I wonder it is because of my careless mistake or there is something that I dont understand?

import GPyOpt
from GPyOpt.methods import BayesianOptimization
def cv_score(parameters):
    parameters = parameters[0]
    score = cross_val_score(
                XGBRegressor(learning_rate=parameters[0],
                              gamma=int(parameters[1]),
                              max_depth=int(parameters[2]),
                              n_estimators=int(parameters[3]),
                              min_child_weight = parameters[4]), 
                x_train, y_train, scoring='neg_mean_squared_error').mean()
    score = np.array(score)
    return score

bds = [{'name': 'learning_rate', 'type': 'continuous', 'domain': (0, 1)},
        {'name': 'gamma', 'type': 'continuous', 'domain': (0, 5)},
        {'name': 'max_depth', 'type': 'discrete', 'domain': (1, 50)},
        {'name': 'n_estimators', 'type': 'discrete', 'domain': (1, 300)},
        {'name': 'min_child_weight', 'type': 'discrete', 'domain': (1, 10)}]


optimizer = BayesianOptimization(f=cv_score, domain=bds,
                                 model_type='GP',
                                 acquisition_type ='EI',
                                 acquisition_jitter = 0.05,
                                 exact_feval=True, 
                                 maximize=True)
optimizer.run_optimization(max_iter=20)

optimizer.x_opt

array([ 0.56133897, 2.697656 , 50. , 300. , 10. ])

xgb_final_param = {'learning_rate': 0.56133897, 'gamma': 2.697656, 'max_depth': 50, 'n_estimators': 300, 'min_child_weight': 10}
xgb_final = SklearnExtra(clf = XGBRegressor(), seed = Seed, params = xgb_final_param)
xgb_final.fit(x_train, y_train)
evaluate(xgb_final, x_test, y_test) #evaluate returns MSE

I expect the MSE to be roughly around 1813 but I got 2810. So I wonder why

Original Q&A

There are 1 answers

**Andrei** · Accepted Answer · 2019-08-07T08:18:24+00:00

Discrete variables in gpyopt aren't specified by their min/max, but by their whole value list instead. Why? Because you may have discontinuity, that is your variable might be only taking values (1, 3, 8). See an example of it here.

So in your example the way to properly specify the the domain for these is to generate a list of all possible values:

{'name': 'max_depth', 'type': 'discrete', 'domain': list(range(1, 51))}

Likewise for other discrete variables. Note that for continuous your code is fine - they are specified by their range.

TechQA.

Why the using optimized parameters (MSE is the minimize objective) in the XGBRegressor gives me different RMSE than the optimized RMSE?

There are 1 answers

Related Questions in PYTHON

Related Questions in MACHINE-LEARNING

Related Questions in OPTIMIZATION

Related Questions in GPYOPT

Popular Questions

Popular Tags

Trending Questions