I split Train
data set and Test
data set.
I used a package rpart
for CART (classification tree) in R (only train set). And I want to carry out a ROC analysis using the ROCR
package.
Variable is `n. use' (response varible... 1=yes, 0=no):
> Pred2 = prediction(Pred.cart, Test$n.use)
Error in prediction(Pred.cart, Test$n.use) :
**Format of predictions is invalid.**
This is my code. What is problem? And what is right type
("class"
or "prob"
?
library(rpart)
train.cart = rpart(n.use~., data=Train, method="class")
Pred.cart = predict(train.cart, newdata = Test, type = "class")
Pred2 = prediction(Pred.cart, Test$n.use)
roc.cart = performance(Pred2, "tpr", "fpr")
The
prediction()
function from theROCR
package expects the predicted "success" probabilities and the observed factor of failures vs. successes. In order to obtain the former you need to applypredict(..., type = "prob")
to therpart
object (i.e., not"class"
). However, as this returns a matrix of probabilities with one column per response class you need to select the "success" class column.As your example, unfortunately, is not reproducible I'm using the
kyphosis
data from therpart
package for illustration:Then you can apply the
prediction()
function fromROCR
. Here, I'm using the in-sample (training) data but the same can be applied out of sample (test data):And you can visualize the ROC curve:
Or the accuracy across cutoffs:
Or any of the other plots and summaries supported by
ROCR
.