custom hinge loss vs sklearn hinge loss

201 views Asked by At

I've written a custom hinge loss function based on the hinge loss formula and tested it on a dataset as well as the sklearn.metrics hinge_loss. But the outcomes are soo different.

Could someone please take a look and let me know what I am getting wrong here?

So first of all this is the data:

from sklearn.datasets import load_breast_cancer

data = load_breast_cancer()

from sklearn.svm import SVC
from sklearn.metrics import hinge_loss

features = data.data
labels = label.reshape(-1,1)

feat_train, feat_test,labels_train,labels_test = train_test_split(features,labels,test_size=0.2)

sample_weight = np.random.rand(len(labels_test))
clf = SVC(kernel='linear', C=1.0, random_state=42)
clf.fit(feat_train, labels_train)
y_pred = clf.predict(feat_test)
loss = hinge_loss(labels_test, y_pred, sample_weight=sample_weight) 

print("Hinge loss:", loss)

The result is:

Hinge loss: 0.09835019117405303

Now this is my custom function:

def hinge_loss_full(feature_matrix, labels, theta, theta_0):
    lf = 0,1 - labels*(np.sum(feature_matrix * theta, axis = 1) + theta_0
    max_lf =  max(lf)
    result = np.mean(max_lf)
  
    return result

And if I plug in the same values for the parameters used for the sklearn one, this is the output:

hinge_loss_full(labels_test, y_pred, theta=sample_weight,theta_0=1)

2.3944007220115626

Why is there such a difference?

1

There are 1 answers

0
Woker001 On
  1. The shape of the numpy array must be the same.

  2. Each library has a different way to calculate the loss function. Therefore, the CS231 hinge loss and Scikit-Learn hinge loss formulas are different.(e.g. Keras sigmoid function)

  3. The function with the same value as the Scikit-Learn function is as follows.

def hinge_loss(y_true, y_pred,  sample_weight):
    margin = y_true * y_pred
    loss = np.maximum(0, 1 - margin)
    return np.average(loss, weights=sample_weight)
    
hinge_loss(np.squeeze(labels_test), np.squeeze(y_pred), sample_weight)