Understanding hinge loss in scikit learn

373 views Asked by At

I have seen this hinge loss chart:

https://math.stackexchange.com/questions/782586/how-do-you-minimize-hinge-loss

And also here:

https://programmathically.com/understanding-hinge-loss-and-the-svm-cost-function/

However, creating the "same" graph using scikit-learn, is quite similar but seems the "opposite". Code is as follows:

from sklearn.metrics import hinge_loss
import matplotlib.pyplot as plt
import numpy as np


predicted = np.arange(-10, 11, 1)
y_true = [1] * len(predicted)
loss = [0] * len(predicted)
for i, (p, y) in enumerate(zip(predicted, y_true)):
  loss[i] = hinge_loss(np.array([y]), np.array([p]))
plt.plot(predicted, loss)

plt.axvline(x = 0, color = 'm', linestyle='dashed')
plt.axvline(x = -1, color = 'r', linestyle='dashed')
plt.axvline(x = 1, color = 'g', linestyle='dashed')

hinge loss for y_true == 1

And some specific points in the chart above:

hinge_loss([1], [-5]) = 0.0,
hinge_loss([1], [-1]) = 0.0,
hinge_loss([1], [0])  = 1.0,
hinge_loss([1], [1])  = 2.0,
hinge_loss([1], [5])  = 6.0
predicted = np.arange(-10, 11, 1)
y_true = [-1] * len(predicted)
loss = [0] * len(predicted)
for i, (p, y) in enumerate(zip(predicted, y_true)):
  loss[i] = hinge_loss(np.array([y]), np.array([p]))
plt.plot(predicted, loss)

plt.axvline(x = 0, color = 'm', linestyle='dashed')
plt.axvline(x = -1, color = 'r', linestyle='dashed')
plt.axvline(x = 1, color = 'g', linestyle='dashed')

hinge loss for y_true == -1

And some specific points in the chart above:

hinge_loss([-1], [-5]) = 0.0,
hinge_loss([-1], [-1]) = 0.0,
hinge_loss([-1], [0])  = 1.0,
hinge_loss([-1], [1])  = 2.0,
hinge_loss([-1], [5])  = 6.0

Can someone explain me why hinge_loss() in scikit-learn seems like the opposite from the other two first charts?

Many thanks in advance


EDIT: Based on the answer, I can reproduce the same output without flipping the values. This is based on the following: As hinge_loss([0], [-1])==0 and hinge_loss([-2], [-1])==0. Based on this, I can call hinge_loss() with an array of two values without altering the calculated loss.

Following code does not flip the values:

predicted = np.arange(-10, 11, 1)
y_true = [1] * len(predicted)
loss = [0] * len(predicted)
for i, (p, y) in enumerate(zip(predicted, y_true)):
  loss[i] = hinge_loss(np.array([y, 0]), np.array([p, -1])) * 2
plt.plot(predicted, loss)

plt.axvline(x = 0, color = 'm', linestyle='dashed')
plt.axvline(x = -1, color = 'r', linestyle='dashed')
plt.axvline(x = 1, color = 'g', linestyle='dashed')
predicted = np.arange(-10, 11, 1)
y_true = [-1] * len(predicted)
loss = [0] * len(predicted)
for i, (p, y) in enumerate(zip(predicted, y_true)):
  loss[i] = hinge_loss(np.array([y,-2]), np.array([p,-1])) * 2
plt.plot(predicted, loss)

plt.axvline(x = 0, color = 'm', linestyle='dashed')
plt.axvline(x = -1, color = 'r', linestyle='dashed')
plt.axvline(x = 1, color = 'g', linestyle='dashed')

The question now is why for each corresponding case, those "combinations" of values work well.

1

There are 1 answers

5
amiola On BEST ANSWER

Having a look at the code underlying the hinge_loss implementation, the following is what happens in the binary case:

lbin = LabelBinarizer(neg_label=-1)
y_true = lbin.fit_transform(y_true)[:, 0]

try:
    margin = y_true * pred_decision
except TypeError:
    raise TypeError("pred_decision should be an array of floats.")

losses = 1 - margin
np.clip(losses, 0, None, out=losses)
np.average(losses, weights=sample_weight)

Due to the fact that LabelBinarizer.fit_transform() behaviour in case of single label defaults to returning an array of negative labels

from sklearn.preprocessing import LabelBinarizer

lbin = LabelBinarizer(neg_label=-1)
lbin.fit_transform([1, 1, 1, 1, 1, 1, 1])   # returns array([[-1],[-1],[-1][-1],[-1],[-1],[-1]])

this implies that the (unique) label sign gets flipped, which explains the plot you obtain.

Despite the example with single label being quite weird, there has been ofc some debate on such issue, see https://github.com/scikit-learn/scikit-learn/issues/6723 eg. Digging into the github issues, it seems that they have not reached a final decision yet on potential fixes to be applied.

Answer to the EDIT:

Imo the way you're enriching y and p in the loop does work because - that way - you're effectively "escaping" from the single label case (in particular, what really matters is the way you're dealing with y). Indeed,

lbin = LabelBinarizer(neg_label=-1)
lbin.fit_transform(y_true)[:, 0]

with y_true=np.array([y,0])=np.array([1,0]) (first case) or with y_true=np.array([y,-2])=np.array([-1,-2]) (second case) respectively return array([ 1, -1]) and array([ 1, -1]). On the other hand, the other possibility for your second case you're mentioning in the comment, namely y_true=np.array([y,-1])=np.array([-1,-1]) does not let you escape from single label case (lbin.fit_transform(np.array([-1, -1]))[:, 0] returns array([-1, -1]); thus you fall back into the "bug" described above).