I would like to study the optimal tradeoff between bias/variance for model tuning. I'm using caret for R which allows me to plot the performance metric (AUC, accuracy...) against the hyperparameters of the model (mtry, lambda, etc.) and automatically chooses the max. This typically returns a good model, but if I want to dig further and choose a different bias/variance tradeoff I need a learning curve, not a performance curve.
For the sake of simplicity, let's say my model is a random forest, which has just one hyperparameter 'mtry'
I would like to plot the learning curves of both training and test sets. Something like this:
(red curve is the test set)
On the y axis I put an error metric (number of misclassified examples or something like that); on the x axis 'mtry' or alternatively the training set size.
Questions:
Has caret the functionality to iteratively train models based of training set folds different in size? If I have to code by hand, how can I do that?
If I want to put the hyperparameter on the x axis, I need all the models trained by caret::train, not just the final model (the one with maximum performance got after CV). Are these "discarded" model still available after train?
At some point, probably after this question was asked, the caret package added the
learning_curve_dat
function which helps assess model performance across a range of training set sizes.Here is the example from the function documentation:
The performance metric(s) are found for each Training_Size and saved in lda_data along with the Data variable ("Resampling", "Training", and optionally "Testing").
Here is a link to the function documentation: https://rdrr.io/cran/caret/man/learning_curve_dat.html
To be clear, this answers the first part of the question but not the second part.
NOTE Before at least August 2020 there was a typo in the caret package code and documentation. The function call was
learing_curve_dat
before it was corrected tolearning_curve_dat
. I've updated my answer to reflect this change. Make sure you are using a recent version of the caret package.