Does a unique solution exist for optimizing the cross entropy in binary logistic regression problem?

56 views Asked by Felix Lu At 13 October 2023 at 15:49

I tried to build the logistic regression model from scratch. The data I used is the Iris dataset. Let me used the example in Chapter 4 in Geron's ML book. I wanted to observe whether or not the three methods result in the same model parameters.

import numpy as np
from sklearn import datasets

iris = datasets.load_iris()
X = iris["data"][:,3:]
y = ( iris["target"] == 2 ).astype(int)

I first went with the logistic regression model in sci-kit learn.

from sklearn.linear_model import LogisticRegression

model = LogisticRegression()
model.fit(X, y)

model.coef_.item(), model.intercept_.item()

showed the slope and intercept values (4.33, -7.19). And then I tried building the cross entropy objective function, and used the minimize in scipy to find the slope and intercept in the model

from scipy.optimize import minimize

XX = np.hstack((X, np.ones((150,1))))

def obj(w, xa, ya):
    logit = xa.dot(w).reshape(-1,1)
    pred = 1/(1 + np.exp((-1.)*logit)).reshape(-1,1)
    return (-1.)*np.mean(ya * np.log(pred) + (np.ones((150,1)) - ya) * np.log(np.ones((150,1)) - pred))

res = minimize(obj, args = (XX, y), x0 = np.array([0.5,0.5]), method='BFGS', options={'gtol':1e-2})

res.x

And the model showed the pair of values are (8.16, -13.34).

I also tried the third way using gradient descent (below), and got a different pair of values again.

total = 10000
theta = np.random.randn(2,1)
learning_rate = 0.15

for i in range(total):
    pred = 1. / (1. + np.exp((-1.) * XX.dot(theta)) )
    deltaT = pred - y
    theta = theta - learning_rate * XX.T.dot(deltaT) / 150.

Some observations are in order.

The three methods result in the same prediction values
The end values of the objective function are different
intercept/slope ratio is close for all three, and in the binary classification problem with one-D input this ratio seems the only relevant number.

The objective function is convex, i.e. the 2nd order derivative always positive, but doesn't it mean we will always reach the global minimum from a gradient descent point of view? No matter where the initial point is, right?

Original Q&A

TechQA.

Does a unique solution exist for optimizing the cross entropy in binary logistic regression problem?

There are 0 answers

Related Questions in NUMPY

Related Questions in MACHINE-LEARNING

Related Questions in SCIPY

Related Questions in LOGISTIC-REGRESSION

Related Questions in CONVEX-OPTIMIZATION

Popular Questions

Trending Questions