I am implementing an SVM and to find the largest margin, I am using a hinge loss cost function like this from this article, but I am implementing it myself. So far I have come up with this, which produces odd values when predicting from my data set, which is indeed linearly separable (my x_train is a 2D array and y_train is a 1D array with the binary split into -1 or 1)
def SVM(x_train, y_train, w="RANDOM", epochs=1000000, learning_rate=0.00001):
if w == "RANDOM":
w = []
for i in range(len(x_train[0])):
w.append(random.random())
w = np.array(w)
for i in range(epochs):
y_pred = w.dot(x_train[0])
prod = y_pred * y_train
l = 0
for val in prod:
if val >= -1:
for n in range(len(w)):
w[n] = w[n] - learning_rate * 2 * 1/epochs * w[n]
else:
for n in range(len(w)):
w[n] = w[n] + learning_rate * (y_train[l] * x_train[l][n] - 2 * 1/epochs * w[n])
l += 1
return w
def cost_function(y, y_pred):
return 1 - y * y_pred
I suspect my problem lies within the prediction and not the correction as I think I have understood gradient descent decently enough to implement it.
prod = y_pred * y_train
is the part you subtract from 1 in the cost function without soft-margin but why do we have to iterate through its elements? That's something that the article does in its code.
In addition, I'm unsure whether predicting y based on only one element from x_train
makes sense for training the AI.