I have seen this hinge loss chart:
https://math.stackexchange.com/questions/782586/how-do-you-minimize-hinge-loss
And also here:
https://programmathically.com/understanding-hinge-loss-and-the-svm-cost-function/
However, creating the "same" graph using scikit-learn, is quite similar but seems the "opposite". Code is as follows:
from sklearn.metrics import hinge_loss
import matplotlib.pyplot as plt
import numpy as np
predicted = np.arange(-10, 11, 1)
y_true = [1] * len(predicted)
loss = [0] * len(predicted)
for i, (p, y) in enumerate(zip(predicted, y_true)):
loss[i] = hinge_loss(np.array([y]), np.array([p]))
plt.plot(predicted, loss)
plt.axvline(x = 0, color = 'm', linestyle='dashed')
plt.axvline(x = -1, color = 'r', linestyle='dashed')
plt.axvline(x = 1, color = 'g', linestyle='dashed')
And some specific points in the chart above:
hinge_loss([1], [-5]) = 0.0,
hinge_loss([1], [-1]) = 0.0,
hinge_loss([1], [0]) = 1.0,
hinge_loss([1], [1]) = 2.0,
hinge_loss([1], [5]) = 6.0
predicted = np.arange(-10, 11, 1)
y_true = [-1] * len(predicted)
loss = [0] * len(predicted)
for i, (p, y) in enumerate(zip(predicted, y_true)):
loss[i] = hinge_loss(np.array([y]), np.array([p]))
plt.plot(predicted, loss)
plt.axvline(x = 0, color = 'm', linestyle='dashed')
plt.axvline(x = -1, color = 'r', linestyle='dashed')
plt.axvline(x = 1, color = 'g', linestyle='dashed')
And some specific points in the chart above:
hinge_loss([-1], [-5]) = 0.0,
hinge_loss([-1], [-1]) = 0.0,
hinge_loss([-1], [0]) = 1.0,
hinge_loss([-1], [1]) = 2.0,
hinge_loss([-1], [5]) = 6.0
Can someone explain me why hinge_loss()
in scikit-learn seems like the opposite from the other two first charts?
Many thanks in advance
EDIT: Based on the answer, I can reproduce the same output without flipping the values. This is based on the following:
As hinge_loss([0], [-1])==0
and hinge_loss([-2], [-1])==0
.
Based on this, I can call hinge_loss()
with an array of two values without altering the calculated loss.
Following code does not flip the values:
predicted = np.arange(-10, 11, 1)
y_true = [1] * len(predicted)
loss = [0] * len(predicted)
for i, (p, y) in enumerate(zip(predicted, y_true)):
loss[i] = hinge_loss(np.array([y, 0]), np.array([p, -1])) * 2
plt.plot(predicted, loss)
plt.axvline(x = 0, color = 'm', linestyle='dashed')
plt.axvline(x = -1, color = 'r', linestyle='dashed')
plt.axvline(x = 1, color = 'g', linestyle='dashed')
predicted = np.arange(-10, 11, 1)
y_true = [-1] * len(predicted)
loss = [0] * len(predicted)
for i, (p, y) in enumerate(zip(predicted, y_true)):
loss[i] = hinge_loss(np.array([y,-2]), np.array([p,-1])) * 2
plt.plot(predicted, loss)
plt.axvline(x = 0, color = 'm', linestyle='dashed')
plt.axvline(x = -1, color = 'r', linestyle='dashed')
plt.axvline(x = 1, color = 'g', linestyle='dashed')
The question now is why for each corresponding case, those "combinations" of values work well.
Having a look at the code underlying the
hinge_loss
implementation, the following is what happens in the binary case:Due to the fact that
LabelBinarizer.fit_transform()
behaviour in case of single label defaults to returning an array of negative labelsthis implies that the (unique) label sign gets flipped, which explains the plot you obtain.
Despite the example with single label being quite weird, there has been ofc some debate on such issue, see https://github.com/scikit-learn/scikit-learn/issues/6723 eg. Digging into the github issues, it seems that they have not reached a final decision yet on potential fixes to be applied.
Answer to the EDIT:
Imo the way you're enriching
y
andp
in the loop does work because - that way - you're effectively "escaping" from the single label case (in particular, what really matters is the way you're dealing withy
). Indeed,with
y_true=np.array([y,0])=np.array([1,0])
(first case) or withy_true=np.array([y,-2])=np.array([-1,-2])
(second case) respectively returnarray([ 1, -1])
andarray([ 1, -1])
. On the other hand, the other possibility for your second case you're mentioning in the comment, namelyy_true=np.array([y,-1])=np.array([-1,-1])
does not let you escape from single label case (lbin.fit_transform(np.array([-1, -1]))[:, 0]
returnsarray([-1, -1])
; thus you fall back into the "bug" described above).