I want to define my custom metric function in caret
, but in this function I want to use additional information that is not used for training.
I therefore need to have the indices (row numbers) of the data that is used in this fold for validation.
Here is a silly example:
generate data:
library(caret)
set.seed(1234)
x <- matrix(rnorm(10),nrow=5,ncol=2 )
y <- factor(c("y","n","y","y","n"))
priors <- c(1,3,2,7,9)
this is my example metric function, it should use information from the priors
vector
my.metric <- function (data,
lev = NULL,
model = NULL) {
out <- priors[-->INDICES.OF.DATA<--] + data$pred/data$obs
names(out) <- "MYMEASURE"
out
}
myControl <- trainControl(summaryFunction = my.metricm, method="repeatedcv", number=10, repeats=2)
fit <- train(y=y,x=x, metric = "MYMEASURE",method="gbm", trControl = mControl)
to make this perhaps even more clear, I could use this in a survival setting where priors
are days and use this in a Surv
object to measure survival AUC in the metric function.
How can I do this in caret?
You can access the row numbers using
data$rowIndex
. Note that the summary function should return a single number as its metric (e.g. ROC, Accuracy, RMSE...). The above function seems to return a vector of length equal to the number of observations in the held out CV-data.If you're interested in seeing the resamples along with their predictions you can add
print(data)
to themy.metric
function.Here's an example using your data (enlarged a bit) and
Metrics::auc
as the performance measure after multiplying the predicted class probabilities with the prior:I don't know too much about survival analysis but I hope this helps.