I'm working on a binary classification problem and I have an sgd classifier like so:
sgd = SGDClassifier(
max_iter = 1000,
tol = 1e-3,
validation_fraction = 0.2,
class_weight = {0:0.5, 1:8.99}
)
I fitted it on my training set and plotted the precision-recall curve:
from sklearn.metrics import plot_precision_recall_curve
disp = plot_precision_recall_curve(sgd, X_test, y_test)
Given that the sgd classifier in scikit-learn uses loss="hinge"
by default, how is it possible for this curve to be plotted? My understanding is that the output of the sgd is not probabilistic -- it's either 1/0. So there are no "thresholds", and yet the sklearn precision-recall curve plots a zigzagged graph with different kinds of thresholds. What's going on here?
The situation you describe is practically identical with one found in a documentation example, using the first 2 classes of the iris data and a LinearSVC classifier (the algorithm uses the squared hinge loss, which, like the hinge loss you use here, results in a classifier that produces only binary outcomes and not probabilistic ones). The resulting plot there is:
i.e. qualitatively similar to yours here.
Nevertheless, your question is a legitimate one and a nice catch indeed; how comes and we get a behavior similar to one produced by probabilistic classifiers, when our classifier does not indeed produce probabilistic predictions (and hence any notion of a threshold sounds irrelevant)?
To see why this is so, we need to do some digging into the scikit-learn source code, starting from the
plot_precision_recall_curve
function used here and following the thread down into the rabbit hole...Starting from the source code of
plot_precision_recall_curve
, we find:So, for the purposes of plotting the PR curve, the predictions
y_pred
are not produced directly by thepredict
method of our classifier, but by the_get_response()
internal function of scikit-learn._get_response()
in turn includes the lines:which finally leads us to the
_check_classifier_response_method()
internal function; you can check the full source code of it - what is of interest here are the following 3 lines after theelse
statement:By now, you may have started getting the point: under the hood,
plot_precision_recall_curve
checks if either apredict_proba()
or adecision_function()
method is available for the classifier used; and if apredict_proba()
is not available, like your case here of an SGDClassifier with hinge loss (or the documentation example of a LinearSVC classifier with squared hinge loss), it reverts to thedecision_function()
method instead, in order to calculate they_pred
which will be subsequently used for plotting the PR (and ROC) curve.The above have arguably answered your programming question about how exactly scikit-learn produces the plot and the underlying calculations in such cases; further theoretical inquiries regarding if & why using the
decision_function()
of a non-probabilistic classifier is indeed a correct and legitimate approach to get a PR (or ROC) curve are out of scope for SO, and they should be addressed to Cross Validated, if necessary.