I trained a ensemble model (RUSBoost) for a binary classification problem by the function fitensemble() in Matlab 2014a. The training by this function is performed 10-fold cross-validation through the input parameter "kfold" of the function fitensemble().
However, the output model trained by this function cannot be used to predict the labels of new data if I use the predict(model, Xtest). I checked the Matlab documents, which says we can use kfoldPredict() function to evaluate the trained model. But I did not find any input of the new data through this function. Also, I found the structure of the trained model with cross-validation is different from that model without cross-validation. So, could anyone please advise me how to use the model, which is trained with cross-validation, to predict labels of new data? Thanks!
kfoldPredict()
needs aRegressionPartitionedModel
orClassificationPartitionedEnsemble
object as input. This already contains the models and data for kfold cross validation.The
RegressionPartitionedModel
object has a fieldTrained
, in which the trained learners that are used for cross validation are stored. You can take any of these learners and use it likepredict(learner, Xdata)
.Edit:
If k is too large, it is possible that there is too little meaningful data in one or more iteration, so the model for that iteration is less accurate. There are no general rules for k, but
k=10
like in the MATLAB default is a good starting point to play around with it. Maybe this is also interesting for you: https://stats.stackexchange.com/questions/27730/choice-of-k-in-k-fold-cross-validation