Loglikelihood and gradient function implementation in Python

1.2k views Asked by At

With reference to the scientific paper https://arxiv.org/abs/1704.04289 I am trying to implement the section 7.3 referring to Optimising hyperparameters. Specifically the equation 35 on the page # 25 in the paper.

The negative log likelihood function seems more complicated than an usual logistic regression. I tried to implement the negative loglikelihood and the gradient descent for log reg as per my code below.

import numpy as np
import pandas as pd
import sklearn
import matplotlib.pyplot as plt
%matplotlib inline

#simulating data to fit logistic regression model
np.random.seed(12)
num_observations = 5000

x1 = np.random.multivariate_normal([0,0], [[1, .75],[.75, 1]], num_observations)
x2 = np.random.multivariate_normal([1,4], [[1, .75],[.75, 1]], num_observations)

simulated_features = np.vstack((x1, x2)).astype(np.float32)
simulated_labels = np.hstack((np.zeros(num_observations), np.ones(num_observations)))

plt.figure(figsize=(12,8))

plt.scatter(simulated_features[:,0], simulated_features[:,1], c = simulated_labels, alpha = .4)

#add a column of ones to deal with bias term
ones = np.ones((num_observations*2, 1))

Xb = np.hstack((ones, simulated_features))

#Activation Function

def sigmoid(scores):
    return 1 / (1 + np.exp(-scores))


#log-likelihood function

def log_likelihood(features, target, weights):
    #model output
    scores = np.dot(features,weights)
    nll = np.sum(target*scores - np.log(1 + np.exp(scores)))
    return nll

def log_reg(features, target, num_steps, learning_rate):
    weights = np.zeros(features.shape[1])
    
    for step in range(num_steps):
        score = np.dot(features, weights)
        predictions = sigmoid(scores)
        
        #update weights with gradient
        error = target - predictions
        gradient = np.dot(features.T, error)
        weights += learning_rate * gradient
        
        if step % 10000 == 0:
            print(log_likelihood(features, target, weights))
        
    return weights

The biggest challenge I am facing here is to implement the terms lambda, DK, theta(dk) and theta(dyn) from the equation in the paper. Theoretically I understand the implementation and I was able to solve it by hand on a paper but I am finding it hard to implement on python while using some simulated data (as shown in my code). Can anyone guide me in how this can be implemented?

0

There are 0 answers