rpart doesn't build a full tree – problems with cp?

2.2k views Asked by At

I'm trying to build a full tree by setting control to rpart.control(minsplit=2, minbucket = 1,cp=0), but it doesn't work. I think the reason may be that the summary tree with 4 splits has cp = 0, but this tree isn't full, so its cp should be > 0.
I also checked the data, and more splits are possible. Here is my code:

#################
# libraries #####
library(datasets)
library(rpart)
library(rpart.plot)
##################
# preparing data #
titanic_obs=c()
for (cl in c("1st", "2nd", "3rd", "Crew")) {
  for (se in c("Male","Female")) {
    for (ag in c("Child","Adult")) {
      for (sur in c("Yes","No")) {
        titanic_obs = rbind(titanic_obs,matrix(rep(c(cl,se,ag,sur),length.out=4*Titanic[cl,se,ag,sur]),ncol=4,byrow=T))    
      }
    }
  }
}

colnames(titanic_obs)= c("Class", "Sex", "Age","Survived")
titanic_data = data.frame(titanic_obs)
summary(titanic_data) 
#################
# fitting model #
titanic_rpart = rpart(Survived ~ Sex + Age + Class,
                  data = titanic_data,method="class",
                  control=rpart.control(minsplit=2, minbucket = 1,cp=0))
#################
# checking ######
summary(titanic_rpart)
prp(titanic_rpart, extra=1, uniform=F, branch=1, yesno=F, border.col=0, xsep="/")
#################
# data ##########
adult_men = titanic_data[titanic_data$Sex=="Male" & titanic_data$Age=="Adult",]
all_am = table(adult_men$Class)
    survived_am = table(adult_men[adult_men$Survived=="Yes",]$Class)
survived_am/all_am
2

There are 2 answers

0
Sainath Adapa On

As indicated in a comment to this question, setting cp=-1 will build the full tree.

1
F. Tusell On

Cannot check right now, but I seem to recall that setting cp=0.000001 or a similarly small number solved the issue for me at some point. Also notice that there are parameters like minsplit and minbucket that may hinder the growth of the tree, so you might want to set appropriate values for those also.