I am trying to create an ordinal regression tree in R using rpart
, with the predictors mostly being ordinal data, stored as factor
in R.
When I created the tree using rpart
, I get something like this:
where the values are the factor values (E.g. A170
has labels ranging from -5 to 10).
However, when I use caret
to train
the data using rpart
, when I extract the final model, the tree no longer has ordinal predictors. See below for a sample output tree
As you see above, it seems the ordinal variable A170
now has been converted into multiple dummy categorical value, i.e. A17010
in the second tree is a dummy for A170
of value 10
.
So, is it possible to retain ordinal variables instead of converting factor variables into multiple binary indicator variables when fitting trees with the caret
package?
Let's start with a reproducible example:
As you note, training with the
rpart
function groups the factor levels together:I was able to reproduce the caret package splitting up the factors into their individual levels using the formula interface to the
train
function:The solution I found to avoid splitting factors by level is to input raw data frames to the
train
function instead of using the formula interface: