I feel like I'm missing something very basic here.
I've run a random forest regression:
INTERP.rf<-randomForest(y~.,data=df,importance=T,mtry=3,ntree=300)
and then extracted the predictions for the training set:
rf.predict<-predict(INTERP.rf,df,type="response")
the %var from rf.predict looked too low so I checked it:
MSE.rf<-sum((rf.predict-y)^2)/length(y)
...and got a wildly different answer than an inspection of the rf.predict object gave.
Please can someone highlight my error?
The correct way to do this is to use:
I was not aware that I needed to use
predict.randomforest(model)
as opposed topredict.randomForest(model,trainingData)
to get the OOB predictions.Thank you to @joran and @Vlo for helpful comments