Strange Behavior for the predict() function

544 views Asked by At

I am currently taking the "Practical Machine Learning" course from Coursera right now and am having run across some strange behavior with the predict function. The question that has been asked was to train a tree and then make some predictions. So that I am not posting the answer here I have changed the dataset used for the problem. The code is as follows:

rm(list = ls())
library(rattle)
data(mtcars)
mtcars$vs = as.factor(mtcars$vs)
set.seed(125)
model = train(am ~ ., method = 'rpart', data = mtcars)
print(model)
fancyRpartPlot(model$finalModel)

sampleData = mtcars[1,]
sampleData[1,names(sampleData)] = rep(NA, length(names(sampleData)))
sampleData[1, c('wt')] = c(4)
predict(model, sampleData[1,], verbose = TRUE)

In the above code, there are two primary sections. The first builds the tree and the second (where sampleData starts) creates a small sample set of data to apply the model to. To make sure that I have the exact same structure as the original data I simply copy the first row of the training dataset and then set all the columns to NA. I then put data in only the columns that the decision tree needs (in this case the wt variable).

When I execute the above code, I get the following result:

Number of training samples: 32 
Number of test samples:     0 

rpart : 0 unknown predictions were added

numeric(0)

For reference, the following is the structure of the tree:

fancyRpartPlot(model$finalModel)

enter image description here

Can somebody help me to understand why the predict function is not returning a predicted value for the sampleData that I provided?

1

There are 1 answers

1
topepo On BEST ANSWER

Unfortunately, even though rpart only used the wt variable in splits, prediction still requires the others to be present. Use a data set with the sample columns:

> predict(model, mtcars[1,])
[1] 0.8571429

Max