How does tweedie nloglike in XGBoost relate to the actual nloglike?

1k views Asked by At

When viewing the code for how XGBoost calculates the tweedie evaluation metric (tweedie-nloglik) we can see that it is calculated as:

bst_float a = y * std::exp((1 - rho_) * std::log(p)) / (1 - rho_);
bst_float b = std::exp((2 - rho_) * std::log(p)) / (2 - rho_);
return -a + b;

Source: line 310-313 below: https://github.com/dmlc/xgboost/blob/master/src/metric/elementwise_metric.cu

The expression does show similarities with the expression for the tweedy deviance for p values between 1-2, but there does not seem to be an exact mapping. Tweedie deviance according to wikipedia:

tweedie deviance

if I remove constants and take the negative logarithmic of the expression from Wikipedia, I do not end up with an expression which equals -a + b from the XGBoost. My questions is then what the value that XGBoost calculates is and how it relates to the negative logarithmic likelihood?

Thanks!

1

There are 1 answers

0
ACifonelli On

I know it is a bit old question but I leave here my findings (that of course can be wrong). XGBoost, as well as LightGBM, use the Tweedie loss in the context of Generalized Linear Models (GLMs). Let's say that our response function follow a Poisson distribution

$$y \sim \mathit{Poisson}(\mu)$$

and that $\mu$ is obtained as a linear combination of several covariates grouped in the vector $x$; $\mu = w^T x$. We know however that $\mu$ goes under some restrictions imposed by the distribution that we are using. For the Poisson we need it to be positive, so we can use the exponential function such that

$$\mu = \exp(w^T x)$$

or, to ease the computation

$$\log(\mu) = w^T x$$.

The $\log()$ is called link function and it is the default choice for several distribution (Tweedie included) and it is also linked to the canonical form of the Exponential Dispersion Models (EDMs). Check the following source for more details https://bookdown.org/steve_midway/BHME/Ch7.html

Since we have log-transformed $\mu$, in the formula we will not have $\mu^{1 - p}$ but $\log(\mu)^{1 - p} = (1 - p)\log(\mu)$. With $\exp((1 - p)\log(\mu))$ we retrieve the correct quantity to be plugged into the log-likelihood.

To conclude and summarize, the actual loss implemented in XGBoost is just the general log-likelihood when the $\log()$ function is used as link function.