Overflow when calculating gradient partial derivatives in Python

217 views Asked by At

I made a def function in python where I could do simple gradient partial derivatives for updating variables in what is a very basic function approximation neural network without activation functions. This is for a single input x1 which tries to estimate an output y.

I have a standardization def function as well and when I run my code but standardize the input data set first I get no issues. When I run the raw data set through the same gradient partial derivatives function then when i do the update for the variable m and b the precision instantly gets huge and I get an overflow issue.

Does anyone know how I can solve this? I have been able to figure it out its happening due to the update in variables m and b which get looped back into y. But I am unsure on how to resolve this? From some quick googling I didn't see any solutions other then people saying overflow is an issue where the precision of your number exceeds the limit of the data type you are using. How do I stop the overflow from occurring?

import os
import numpy as np
import random
import csv
import urllib.request
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D


########################## Setting the working directory #####################
abspath = os.path.abspath(__file__)
dname = os.path.dirname(abspath)
os.chdir(dname)

########## Downloading the data set to the current working directory #########
url = 'https://raw.githubusercontent.com/tofighi/MachineLearning/master/datasets/student_marks.csv'
urllib.request.urlretrieve(url,filename = 'data.csv')
data = np.genfromtxt(fname = 'data.csv', dtype = float, delimiter = ',', skip_header=0)
data = np.delete(data,(0),axis=0)


def standardization(data):
    x = np.zeros((len(data),2))
    mean_data1 = np.mean(data[:,0])
    std_data1 = np.std(data[:,0])
    
    mean_data2 = np.mean(data[:,1])
    std_data2 = np.std(data[:,1])
    
    for i in range(0,len(x)):
        x[i,0]= ((data[i,0] - mean_data1)/std_data1);
        x[i,1]= ((data[i,1] - mean_data2)/std_data2);
    return x

def gradient_Partial_Derivatives(nEpoch,N,b,m,x):
    #m_tracker = np.zeros(nEpoch*N)
    #b_tracker = np.zeros(nEpoch*N)
    error = np.zeros(nEpoch*N)

    counter = 0
    error_sum = 0
    sum_counter = 1
    #Training m and b
    for epoch in range(0,nEpoch):
        a=range(0,len(x))
        sp=random.sample(a,len(x))
        
        for j in range(0,N):
            #Calculate new final grade based on midterm. Training estimate for y.
            y = b + m*x[sp[j],0];

            #Find the error between estimate final y and target final x[j,1]
            #This is not the error function but just e = y_actual - y_estimate
            e = x[sp[j],1] - y;
            
            #Update m and b using partial derivatives
            m = m + alpha*(2/(sum_counter))*e*x[sp[j],0]
            b = b + alpha*(2/(sum_counter))*e
            
            er = (((x[sp[j],1])-(y))**2)
            error_sum = error_sum + er
            error[counter] = error_sum/(sum_counter)
            #m_tracker[counter] = m;
            #b_tracker[counter] = b;
            counter = counter + 1;
            sum_counter = sum_counter + 1
    
    return m, b, error

########################### Initializing Variables ###########################
m = -0.5;
b = 0;
alpha = 0.1;

##############################################################################
##############################################################################
############################## Standardization ###############################

#Standardizing the input
x_standard = standardization(data)

#Calcualting partial derivative and updating m and b
m_final, b_final, er = gradient_Partial_Derivatives(1,100, b, m, x_standard)

#Calcualting partial derivative and updating m and b for 2000 iterations
m_final1, b_final1, er1 = gradient_Partial_Derivatives(20,100, b, m, x_standard)

##############################################################################
##############################################################################
############################ No Standardization ##############################


#Calcualting partial derivative and updating m and b
m_final2, b_final2, er2 = gradient_Partial_Derivatives(1,100, b, m, data)

#Calcualting partial derivative and updating m and b for 2000 iterations
m_final3, b_final3, er3 = gradient_Partial_Derivatives(20,100, b, m, data)

1

There are 1 answers

0
Olek On

So for those reading I added the sum_counter just before posting here to see if that would solve the problem of overflow in the def gradient partial derivatives. I thought it did not work but after running the code again I get no overflow error and that was the only change I made. Before I was dividing m and b by a variable called N which was set to 100 always and didn't accumulate as the updates did. This seems to have fixed everything.