Precision Recall Curves in R

1k views Asked by At

Can someone help me understand the "threshold" (i.e. the color gradient) in this precision recall curve (produced in R)?

https://i.stack.imgur.com/c64Tw.jpg

R code:

library(PRROC)
 x <-rnorm(1000)
 y <-rnorm(1000,-1) 
pr <- pr.curve(x,y, curve=TRUE)
 plot(pr)

Why does the threshold go from -3 to 3? Doesn't the threshold have to be between 0 and 1? Does anyone know how to fix this (produce a threshold between 0 and 1)?

Thanks!

source: https://cran.rstudio.com/web/packages/PRROC/PRROC.pdf

1

There are 1 answers

0
Maurits Evers On

It is not really clear to me what you're trying to do; the figure you link to in the comments shows the precision/recall as a function of varying threshold parameters of a classifier, but you don't show code involving any classification problem.

Let's use the iris data set and construct a simple binary classification problem; to do so, we first remove all data for Species == "setosa".

data <- subset(iris, Species %in% c("versicolor", "virginica"))
data <- droplevels(data)

We then use a simple SVM with a Gaussian (radial) kernel for our classifier. In this case we have only one parameter gamma. We determine the precision/recall of our classifier for changing values of gamma and plot the results

library(e1071)
library(caret)
library(tidyverse)

gamma <- 10^(seq(-3, 2, by = 0.1))
df <- map_dfr(
    cost,
    function(param)
        svm(Species ~ ., data = data, gamma = param, kernel = "radial") %>%
            predict() %>%
            confusionMatrix(data$Species, mode = "prec_recall") %>%
            pluck("byClass") %>%
            enframe() %>%
            filter(name %in% c("Precision", "Recall")) %>%
            mutate(gamma_param = param))

ggplot(df, aes(gamma_param, value, colour = name)) + 
    geom_line() + 
    scale_x_log10() +
    expand_limits(y = c(0, 1)) + 
    theme_minimal()

enter image description here

Note that the parameter(s) (and the values it/they take) depend(s) on your classifier; in this simple case you only have one single tunable parameter.

PS. I have used ggplot and other tidyverse functions out of habit & convenience; you can do something similar in base R as well.