I have been using rpart to train a supervised decision tree model, with binary responses. The problem with the results is that some features get split multiple times in a non-monotonic way. For instance, feature A might be split into three intervals, [0,0.4],[0.4,0.6],[0.6,1], corresponding to the following responses respectively, -1,1,-1. I would prefer that each feature gets split once and in a binary way. Is there a way to do that in R?
An illustrating example:
Suppose I am interested in predicting college dropout rate from SAT score. Then the tree or rpart package in R might give me the following model:
1. SAT > 1100: no dropout
2. SAT <= 1100:
3. SAT > 900: dropout
4. SAT <= 900: no dropout
While this might be the best binary tree model given the training data. I want to inject my domain knowledge that the relation between SAT score and dropout probability should be monotone, and enforce that there is a single SAT threshold for determining the dropout probability.
So my question is if there is a way to enforce monotonicity in the sense above in R.
You can also try the party package, you can enforce single split there