I am currently taking the "Practical Machine Learning" course from Coursera right now and am having run across some strange behavior with the predict function. The question that has been asked was to train a tree and then make some predictions. So that I am not posting the answer here I have changed the dataset used for the problem. The code is as follows:
rm(list = ls())
library(rattle)
data(mtcars)
mtcars$vs = as.factor(mtcars$vs)
set.seed(125)
model = train(am ~ ., method = 'rpart', data = mtcars)
print(model)
fancyRpartPlot(model$finalModel)
sampleData = mtcars[1,]
sampleData[1,names(sampleData)] = rep(NA, length(names(sampleData)))
sampleData[1, c('wt')] = c(4)
predict(model, sampleData[1,], verbose = TRUE)
In the above code, there are two primary sections. The first builds the tree and the second (where sampleData
starts) creates a small sample set of data to apply the model to. To make sure that I have the exact same structure as the original data I simply copy the first row of the training dataset and then set all the columns to NA
. I then put data in only the columns that the decision tree needs (in this case the wt
variable).
When I execute the above code, I get the following result:
Number of training samples: 32
Number of test samples: 0
rpart : 0 unknown predictions were added
numeric(0)
For reference, the following is the structure of the tree:
fancyRpartPlot(model$finalModel)
Can somebody help me to understand why the predict
function is not returning a predicted value for the sampleData
that I provided?
Unfortunately, even though
rpart
only used thewt
variable in splits, prediction still requires the others to be present. Use a data set with the sample columns:Max