I'm trying to fit a xgboost regressor in a really large data. I was hoping to use the earlystop in 50 trees if no improvement is made, and to print the evaluation metric in each 10 trees (I'm using RMSE as my main metric).
My current code's the following:
#Building a training DMatrix from my training dataset
xgb_tr=xgb.DMatrix(data=x_train[predictors],label=x_train['target'].values,feature_names=predictors)
#Building a testinng DMatrix from my testing dataset
xgb_te=xgb.DMatrix(data=x_test[predictors],label=x_test['target'].values,feature_names=predictors)
params_xgb={
'objective':'reg:linear',
'eval_metric':'rmse'
}
best_xgb=xgb.train(params_xgb,
xgb_tr,
evals=[(xgb_tr,'training'), (xgb_te,'test')],
num_boost_round=3000,
early_stopping_rounds=50,
verbose_eval=10)
What I was expecting was something like this (this is the output from a lgbm model):
Training until validation scores don't improve for 50 rounds
[10] train's rmse: 1.18004 valid's rmse: 1.10737
[20] train's rmse: 1.16906 valid's rmse: 1.09693
[30] train's rmse: 1.15957 valid's rmse: 1.08851
[40] train's rmse: 1.14905 valid's rmse: 1.07874
[50] train's rmse: 1.14026 valid's rmse: 1.07104
[60] train's rmse: 1.13104 valid's rmse: 1.06248
[70] train's rmse: 1.12265 valid's rmse: 1.05476
[80] train's rmse: 1.114 valid's rmse: 1.04638
[90] train's rmse: 1.10739 valid's rmse: 1.04018
[100] train's rmse: 1.10001 valid's rmse: 1.03354
But instead I got a puzzling error message:
---------------------------------------------------------------------------
XGBoostError Traceback (most recent call last)
<ipython-input-26-827da738fc42> in <module>
1 evals_results = {}
----> 2 best_xgb=xgb.train(params_xgb,
3 xgb_tr,
4 evals=[(xgb_tr,'training'), (xgb_te,'test')],
5 num_boost_round=3000,
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/xgboost/training.py in train(params, dtrain, num_boost_round, evals, obj, feval, maximize, early_stopping_rounds, evals_result, verbose_eval, xgb_model, callbacks, learning_rates)
210 callbacks.append(callback.reset_learning_rate(learning_rates))
211
--> 212 return _train_internal(params, dtrain,
213 num_boost_round=num_boost_round,
214 evals=evals,
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/xgboost/training.py in _train_internal(params, dtrain, num_boost_round, evals, obj, feval, xgb_model, callbacks)
72 # Skip the first update if it is a recovery step.
73 if version % 2 == 0:
---> 74 bst.update(dtrain, i, obj)
75 bst.save_rabit_checkpoint()
76 version += 1
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/xgboost/core.py in update(self, dtrain, iteration, fobj)
1106
1107 if fobj is None:
-> 1108 _check_call(_LIB.XGBoosterUpdateOneIter(self.handle, ctypes.c_int(iteration),
1109 dtrain.handle))
1110 else:
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/xgboost/core.py in _check_call(ret)
174 """
175 if ret != 0:
--> 176 raise XGBoostError(py_str(_LIB.XGBGetLastError()))
177
178
XGBoostError: [17:24:56] src/tree/updater_histmaker.cc:311: fv=inf, hist.last=inf
Stack trace:
[bt] (0) 1 libxgboost.dylib 0x0000000116ac6319 dmlc::LogMessageFatal::~LogMessageFatal() + 57
[bt] (1) 2 libxgboost.dylib 0x0000000116b8bef4 xgboost::tree::CQHistMaker::HistEntry::Add(float, xgboost::detail::GradientPairInternal<float>) + 772
[bt] (2) 3 libxgboost.dylib 0x0000000116b8b6b3 xgboost::tree::CQHistMaker::UpdateHistCol(std::__1::vector<xgboost::detail::GradientPairInternal<float>, std::__1::allocator<xgboost::detail::GradientPairInternal<float> > > const&, xgboost::common::Span<xgboost::Entry const, -1ll> const&, xgboost::MetaInfo const&, xgboost::RegTree const&, std::__1::vector<unsigned int, std::__1::allocator<unsigned int> > const&, unsigned int, std::__1::vector<xgboost::tree::CQHistMaker::HistEntry, std::__1::allocator<xgboost::tree::CQHistMaker::HistEntry> >*) + 643
[bt] (3) 4 libxgboost.dylib 0x0000000116b8d639 xgboost::tree::GlobalProposalHistMaker::CreateHist(std::__1::vector<xgboost::detail::GradientPairInternal<float>, std::__1::allocator<xgboost::detail::GradientPairInternal<float> > > const&, xgboost::DMatrix*, std::__1::vector<unsigned int, std::__1::allocator<unsigned int> > const&, xgboost::RegTree const&) + 1433
[bt] (4) 5 libxgboost.dylib 0x0000000116b834c4 xgboost::tree::HistMaker::Update(std::__1::vector<xgboost::detail::GradientPairInternal<float>, std::__1::allocator<xgboost::detail::GradientPairInternal<float> > > const&, xgboost::DMatrix*, xgboost::RegTree*) + 388
[bt] (5) 6 libxgboost.dylib 0x0000000116b82df0 xgboost::tree::HistMaker::Update(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, std::__1::vector<xgboost::RegTree*, std::__1::allocator<xgboost::RegTree*> > const&) + 144
[bt] (6) 7 libxgboost.dylib 0x0000000116b26296 xgboost::gbm::GBTree::BoostNewTrees(xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::DMatrix*, int, std::__1::vector<std::__1::unique_ptr<xgboost::RegTree, std::__1::default_delete<xgboost::RegTree> >, std::__1::allocator<std::__1::unique_ptr<xgboost::RegTree, std::__1::default_delete<xgboost::RegTree> > > >*) + 1766
[bt] (7) 8 libxgboost.dylib 0x0000000116b22566 xgboost::gbm::GBTree::DoBoost(xgboost::DMatrix*, xgboost::HostDeviceVector<xgboost::detail::GradientPairInternal<float> >*, xgboost::ObjFunction*) + 310
[bt] (8) 9 libxgboost.dylib 0x0000000116ac27cc xgboost::LearnerImpl::UpdateOneIter(int, xgboost::DMatrix*) + 1532
Does anyone came across such error? If no, is there a better way to implement the XGBoost alghoritm in a regression with this callback?