I've successfully completed an analysis in rpart, where I have 0-1 outcome data, where I have weighted the data to deal with the problem of a scarce response. When I plot the data using prp
, I want the labels to have the true proportion, rather than the weighted proportion. Is this possible?
A sample data set below (note that I am working with many more factors than I'm using here!)
require(rpart)
require(rpart.plot)
set.seed(1001)
x<-rnorm(1000)
y<-rbinom(1000,size=1,prob=1/(1+exp(-x)))
z<-10+rnorm(1000)
weights<-ifelse(y==0,1,z)
rpartfun<-rpart(y~x,
weights=z,method="class",control=list(cp=0))
rparttrim<- prune(rpartfun,cp=rpartfun$cptable[which.min(rpartfun$cptable[,"xerror"]),"CP"])
prp(rparttrim,extra=104)
[I would produce the image I get from that here, but I don't have enough reputation]
Where I would like that first node (and indeed,all the nodes!) to, instead of having .28 to .72 (the weighted proportions), have 0.65 to 0.35 (the true proportion).