In R, when I try to build a rpart CTree with a caret using:
tree <- caret::train(LoanStatus ~ ., data = home_training, method = "rpart")
Everything is fine until I try to predict:
predictions <- predict(tree$finalModel, newdata = home_validation, type = "class")
Which gives me the error: Error in eval(predvars, data, env): object 'Gender1' not found
Then, I notice that R has duplicated some of my predictor variables (they are factors):
varImp(tree) outputs:
ApplicantIncome 2.218022
CoapplicantIncome 4.564288
Education1 6.214741
LoanAmount 7.183707
LoanAmountTerm 1.554240
Married1 6.895146
PropertyArea1 5.806154
Gender1 0.000000
Dependents1 0.000000
Dependents2 0.000000
Dependents3 0.000000
SelfEmployed1 0.000000
PropertyArea2 0.000000
Which contains a lot of duplicates.
If I do the same using rpart directly with: tree2 <- rpart(LoanStatus ~ ., home_training, method = "class")
Does not gives me any errors and also has no duplicate variables.
I wanted to do it using a caret, because it allows to use cross validation.
How can I fix this?