With reference to the scientific paper https://arxiv.org/abs/1704.04289 I am trying to implement the section 7.3 referring to Optimising hyperparameters. Specifically the equation 35 on the page # 25 in the paper.
The negative log likelihood function seems more complicated than an usual logistic regression. I tried to implement the negative loglikelihood and the gradient descent for log reg as per my code below.
import numpy as np
import pandas as pd
import sklearn
import matplotlib.pyplot as plt
%matplotlib inline
#simulating data to fit logistic regression model
np.random.seed(12)
num_observations = 5000
x1 = np.random.multivariate_normal([0,0], [[1, .75],[.75, 1]], num_observations)
x2 = np.random.multivariate_normal([1,4], [[1, .75],[.75, 1]], num_observations)
simulated_features = np.vstack((x1, x2)).astype(np.float32)
simulated_labels = np.hstack((np.zeros(num_observations), np.ones(num_observations)))
plt.figure(figsize=(12,8))
plt.scatter(simulated_features[:,0], simulated_features[:,1], c = simulated_labels, alpha = .4)
#add a column of ones to deal with bias term
ones = np.ones((num_observations*2, 1))
Xb = np.hstack((ones, simulated_features))
#Activation Function
def sigmoid(scores):
return 1 / (1 + np.exp(-scores))
#log-likelihood function
def log_likelihood(features, target, weights):
#model output
scores = np.dot(features,weights)
nll = np.sum(target*scores - np.log(1 + np.exp(scores)))
return nll
def log_reg(features, target, num_steps, learning_rate):
weights = np.zeros(features.shape[1])
for step in range(num_steps):
score = np.dot(features, weights)
predictions = sigmoid(scores)
#update weights with gradient
error = target - predictions
gradient = np.dot(features.T, error)
weights += learning_rate * gradient
if step % 10000 == 0:
print(log_likelihood(features, target, weights))
return weights
The biggest challenge I am facing here is to implement the terms lambda, DK, theta(dk) and theta(dyn) from the equation in the paper. Theoretically I understand the implementation and I was able to solve it by hand on a paper but I am finding it hard to implement on python while using some simulated data (as shown in my code). Can anyone guide me in how this can be implemented?