Computing deviance for conditional inference trees

300 views Asked by At

I am trying to implement the use of conditional inference trees (by package partykit) as induction trees, which purpose is merely describing and not predicting individual cases. According to Ritschard here, here and there, for example, a measure of deviance can be estimated by comparing by means of cross-tabs the real and estimated distributions of the response variable in relationship to the possible predictors-based profiles, the so called ^T T and tables. I would like to use deviance and other derivated statistics as a GOF measure of objects obtained by ctree() function. I am introducing myself to this topic, and I would very much appreciate some input, such as a piece of R code or some orientation about the structure of ctree objects that could be involved in the coding. I have thought myself that I could, from scratch, obtain both target and predicted tables and compute later the deviance formula. I confess being not confident at all about how to proceed though.

Thanks a lot beforehand!

1

There are 1 answers

0
Achim Zeileis On BEST ANSWER

Some background information first: We have discussed adding deviance() or logLik() methods for ctree objects. So far we haven't done so because conditional inference trees are not associated with a particular loss function or even likelihood. Instead, only the associations between response and partitioning variables are assessed by means of conditional inference tests using certain influence and regressor transformations. However, for the default regression and classification case, measures of deviance or log-likelihood can be a useful addition in practice. So maybe we will add these methods in future versions.

If you want to consider trees associated with a formal deviance/likelihood, you may consider using the general mob() framework or the lmtree() and glmtree() convenience functions. If only partitioning variables are specified (and no further regressors to be used in every node), these often lead to very similar trees compared to ctree(). But then you can also use AIC() etc.

But to come back to your original question: You can compute deviance/log-likelihood or other loss functions fairly easily if you look at the model response and the fitted response. Alterantively, you can extract a factor variable that indicates the terminal nodes and refit a linear or multinomial model. This will have the same fitted values but also supply deviance() and logLik(). Below, I illustrate this with the airct and irisct trees that you obtain when running example("ctree", package = "partykit").

Regression: The Gaussian deviance is simply the residual sum of squares:

sum((airq$Ozone - predict(airct, newdata = airq, type = "response"))^2)
## [1] 46825.35

The same can be obtained by re-fitting as a linear regression model:

airq$node <- factor(predict(airct, newdata = airq, type = "node"))
airlm <- lm(Ozone ~ node, data = airq)
deviance(airlm)
## [1] 46825.35
logLik(airlm)
## 'log Lik.' -512.6311 (df=6)

Classification: The log-likelihood is simply the sum of the predicted log-probabilities at the observed classes. And the deviance is -2 times the log-likelihood:

irisprob <- predict(irisct, type = "prob")
sum(log(irisprob[cbind(1:nrow(iris), iris$Species)]))
## [1] -15.18056
-2 * sum(log(irisprob[cbind(1:nrow(iris), iris$Species)]))
## [1] 30.36112

Again, this can also be obtained by re-fitting as a multinomial model:

library("nnet")
iris$node <- factor(predict(irisct, newdata = iris, type = "node"))
irismultinom <- multinom(Species ~ node, data = iris, trace = FALSE)
deviance(irismultinom)
## [1] 30.36321
logLik(irismultinom)
## 'log Lik.' -15.1816 (df=8)

See also the discussion in https://stats.stackexchange.com/questions/6581/what-is-deviance-specifically-in-cart-rpart for the connections between regression and classification trees and generalized linear models.