I have a multiclass classification problem and want to build a precision-recall curve using pr_curve
from yardstick library in R. This function requires that a tibble with probabilities for each class were fed to it, like this (this is data(hpc_cv)
).
How do I get there from my classification results, stored as columns in a tibble?
library(yardstick)
data <- tibble(predicted = as.factor(c("A", "A", "B", "B", "C", "C")),
expected = as.factor(c("A", "B", "B", "C", "A", "C")))
data %>% conf_mat(truth = expected, estimate = predicted)
I have not found a function in yardstick (or elsewhere) to calculate those.
I am not sure how class probs are calculated, I am thinking along these lines:
data %>% filter(predicted == "A") %>% summarise(n = n() / 6)
Is this correct? If so, I wonder if there is a nice way to do it without for-loops on each class in each fold, and to receive a tibble like hpc_cv on the picture above.
Class probabilities are generated by a specific model for each individual data point.
PR curves (and precision and recall) are metrics for data sets where the outcome has two classes. You can do multiclass averaging to get an overall PR curve AUC though.
There is an example below but I would advise reading the tidymodels book for a bit before proceeding.
Created on 2022-12-09 with reprex v2.0.2