Difference between them is not well explained in make_scorer
documentation. I observed that if needs_proba
or needs_threshold
is set to True, scoring function receives pred_proba
instead of y_pred
. However, it is not possible to set them both True. It gives error as
ValueError: Set either needs_proba or needs_threshold to True, but not both
The documentation for needs_threshold
says:
For example average_precision or the area under the roc curve can not be computed using discrete predictions alone.
which I understood as needs_threshold should be set to True, if scoring is average_precision or roc_auc_score. However, it works the same whether needs_threshold is True or False.
Can you help me to understand the difference between them and usage of needs_threshold
?
Per the note further down the docs page,
needs_threshold
tries fordecision_function
before falling back topredict_proba
. For rank-ordering metrics likeroc_auc_score
andaverage_precision
, there won't be a difference.I suppose you could desire a metric that takes either raw decision function output or the (calibrated?) probability outputs. For example, in an SVC, the decision function is (signed) distance from the separating plane, which you might like to compute the average among misclassified examples of, whereas you might also want a metric that makes use of the resulting class probabilities (after a Platt calibration, which happens internally when the SVC's
probability=True
).