I made a def function in python where I could do simple gradient partial derivatives for updating variables in what is a very basic function approximation neural network without activation functions. This is for a single input x1 which tries to estimate an output y.
I have a standardization def function as well and when I run my code but standardize the input data set first I get no issues. When I run the raw data set through the same gradient partial derivatives function then when i do the update for the variable m and b the precision instantly gets huge and I get an overflow issue.
Does anyone know how I can solve this? I have been able to figure it out its happening due to the update in variables m and b which get looped back into y. But I am unsure on how to resolve this? From some quick googling I didn't see any solutions other then people saying overflow is an issue where the precision of your number exceeds the limit of the data type you are using. How do I stop the overflow from occurring?
import os
import numpy as np
import random
import csv
import urllib.request
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
########################## Setting the working directory #####################
abspath = os.path.abspath(__file__)
dname = os.path.dirname(abspath)
os.chdir(dname)
########## Downloading the data set to the current working directory #########
url = 'https://raw.githubusercontent.com/tofighi/MachineLearning/master/datasets/student_marks.csv'
urllib.request.urlretrieve(url,filename = 'data.csv')
data = np.genfromtxt(fname = 'data.csv', dtype = float, delimiter = ',', skip_header=0)
data = np.delete(data,(0),axis=0)
def standardization(data):
x = np.zeros((len(data),2))
mean_data1 = np.mean(data[:,0])
std_data1 = np.std(data[:,0])
mean_data2 = np.mean(data[:,1])
std_data2 = np.std(data[:,1])
for i in range(0,len(x)):
x[i,0]= ((data[i,0] - mean_data1)/std_data1);
x[i,1]= ((data[i,1] - mean_data2)/std_data2);
return x
def gradient_Partial_Derivatives(nEpoch,N,b,m,x):
#m_tracker = np.zeros(nEpoch*N)
#b_tracker = np.zeros(nEpoch*N)
error = np.zeros(nEpoch*N)
counter = 0
error_sum = 0
sum_counter = 1
#Training m and b
for epoch in range(0,nEpoch):
a=range(0,len(x))
sp=random.sample(a,len(x))
for j in range(0,N):
#Calculate new final grade based on midterm. Training estimate for y.
y = b + m*x[sp[j],0];
#Find the error between estimate final y and target final x[j,1]
#This is not the error function but just e = y_actual - y_estimate
e = x[sp[j],1] - y;
#Update m and b using partial derivatives
m = m + alpha*(2/(sum_counter))*e*x[sp[j],0]
b = b + alpha*(2/(sum_counter))*e
er = (((x[sp[j],1])-(y))**2)
error_sum = error_sum + er
error[counter] = error_sum/(sum_counter)
#m_tracker[counter] = m;
#b_tracker[counter] = b;
counter = counter + 1;
sum_counter = sum_counter + 1
return m, b, error
########################### Initializing Variables ###########################
m = -0.5;
b = 0;
alpha = 0.1;
##############################################################################
##############################################################################
############################## Standardization ###############################
#Standardizing the input
x_standard = standardization(data)
#Calcualting partial derivative and updating m and b
m_final, b_final, er = gradient_Partial_Derivatives(1,100, b, m, x_standard)
#Calcualting partial derivative and updating m and b for 2000 iterations
m_final1, b_final1, er1 = gradient_Partial_Derivatives(20,100, b, m, x_standard)
##############################################################################
##############################################################################
############################ No Standardization ##############################
#Calcualting partial derivative and updating m and b
m_final2, b_final2, er2 = gradient_Partial_Derivatives(1,100, b, m, data)
#Calcualting partial derivative and updating m and b for 2000 iterations
m_final3, b_final3, er3 = gradient_Partial_Derivatives(20,100, b, m, data)
So for those reading I added the sum_counter just before posting here to see if that would solve the problem of overflow in the def gradient partial derivatives. I thought it did not work but after running the code again I get no overflow error and that was the only change I made. Before I was dividing m and b by a variable called N which was set to 100 always and didn't accumulate as the updates did. This seems to have fixed everything.