How to build a classification tree with only binary splits in each feature variable (preferably in R)?

2.4k views Asked by At

I have been using rpart to train a supervised decision tree model, with binary responses. The problem with the results is that some features get split multiple times in a non-monotonic way. For instance, feature A might be split into three intervals, [0,0.4],[0.4,0.6],[0.6,1], corresponding to the following responses respectively, -1,1,-1. I would prefer that each feature gets split once and in a binary way. Is there a way to do that in R?

An illustrating example:

Suppose I am interested in predicting college dropout rate from SAT score. Then the tree or rpart package in R might give me the following model:

1. SAT > 1100: no dropout
2. SAT <= 1100:
  3. SAT > 900: dropout
  4. SAT <= 900: no dropout

While this might be the best binary tree model given the training data. I want to inject my domain knowledge that the relation between SAT score and dropout probability should be monotone, and enforce that there is a single SAT threshold for determining the dropout probability.

So my question is if there is a way to enforce monotonicity in the sense above in R.

1

There are 1 answers

2
David Arenburg On

You can also try the party package, you can enforce single split there

library(party)
library(survival)
plot(ctree(status  ~ time1,  rats2), type = "simple")

enter image description here

plot(ctree(status  ~ time1,  rats2, controls = ctree_control(stump = T)), type = "simple")

enter image description here