I'd like to preface by saying I'm pretty new to using xgboost, pandas, and numpy.
Currently I'm working on implementing a custom OBJ function for XGBoost based on the kelly criterion. This approach is drawn from another post on datascience.stackexchange: https://datascience.stackexchange.com/questions/16186/kelly-criterion-in-xgboost-loss-function
From reading the documentation of XGBoost, I need to return the gradient and hessian. (https://xgboost.readthedocs.io/en/latest/tutorials/custom_metric_obj.html) The gradient of the function is:
The hessian of the function is:
Where:
b = odds received on a wager
p = probability of winning
x = prediction from the algorithm
For this I'm going to treat p as a binary variable, 1 or 0, for whether the wager succeeded.
So, p = true outcome, 1 or 0
Using the documentation I wrote the following code, I've also provided a small sample dataset:
kell_train_data = np.array([0.08396877, 0.07131547, 0.17921676, 0.22317006, 0.06278754, 0.29874458, 0.08079682, 0.13074108, 0.06416036], 0.12209199, 0.10400956, 0.28764891, 0.2913481, 0.09450234, 0.07858831, 0.09246751, 0.17008012, 0.29026032, 0.2741014 , 0.05574227)
odds_train = np.array([0.149254, 0.108696, 0.312500, 0.217391, 0.061350, 0.208333, 0.178571, 0.065359, 0.037453, 0.107527, 0.256410, 0.400000, 0.370370, 0.085470, 0.058140, 0.204082, 0.476190, 0.294118, 0.121951, 0.033003])
y_train = np.array([0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0]
kell_train_data = kell_train_data.reshape(kell_train_data.shape[0], -1)
def gradient(y_pred, y_true, odds = odds_train):
"Compute gradient of betting function"
return (((-(odds+1)*y_true +odds*y_pred+1)/((y_pred-1)(odds*y_pred+1))))
def hessian(y_pred, y_true, odds = odds_train):
"compute hessian of betting function"
return (-(((odds**2)*y_true )/(odds*y_pred+1)**2)-((1-y_true)/((1-y_pred)**2)))
def kellyobjfunc(y_pred, y_true, odds = odds_train) :
"kelly objective function for xgboost"
grad = gradient(y_pred, y_true, odds)
hess = hessian(y_pred, y_true, odds)
return grad, hess
kell_mod = xgb.XGBClassifier(objective = kellyobjfunc, maximize = True)
kell_mod.fit(kell_train_data, y_train)
However, when I run the above code I get the following error:
Traceback (most recent call last):
File "<ipython-input-623-18279e95b288>", line 1, in <module>
kell_mod.fit( kell_target, y_train)
File "C:\Users\USER\Anaconda3\lib\site-packages\xgboost\core.py", line 422, in inner_f
return f(**kwargs)
File "C:\Users\USER\Anaconda3\lib\site-packages\xgboost\sklearn.py", line 919, in fit
callbacks=callbacks)
File "C:\Users\USER\Anaconda3\lib\site-packages\xgboost\training.py", line 214, in train
early_stopping_rounds=early_stopping_rounds)
File "C:\Users\USER\Anaconda3\lib\site-packages\xgboost\training.py", line 101, in _train_internal
bst.update(dtrain, i, obj)
File "C:\Users\USERR\Anaconda3\lib\site-packages\xgboost\core.py", line 1285, in update
grad, hess = fobj(pred, dtrain)
File "C:\Users\USER\Anaconda3\lib\site-packages\xgboost\sklearn.py", line 49, in inner
return func(labels, preds)
File "<ipython-input-621-35f90873cb76>", line 14, in kellyobjfunc
grad = gradient(y_pred, y_true, odds)
File "<ipython-input-621-35f90873cb76>", line 5, in gradient
return (((-(odds+1)*y_true +odds*y_pred+1)/((y_pred-1)(odds*y_pred+1))))
TypeError: 'numpy.ndarray' object is not callable
I'm not sure what's causing this issue. Any insight or help would be appreciated.
So I found the error.
In the gradient function the position of the parentheses was causing the error.
Should actually be:
Also, the xgb model should be:
The code now executes successfully.