unexpected output with stochastic gradient descent algorithm for linear regression

Question

unexpected output with stochastic gradient descent algorithm for linear regression

130 views Asked by Merora At 27 November 2023 at 22:46

I had an unexpected output while implementing SGD algorithm for my ML homework.

This is part of my training data which normally has 320 rows:

my dataset: https://github.com/Jangrae/csv/blob/master/carseats.csv

I first did some data preprocessing:

import pandas as pd
from sklearn.preprocessing import StandardScaler
import numpy as np

train_data = pd.read_csv('carseats_train.csv')
train_data.replace({'Yes': 1, 'No': 0}, inplace=True)
onehot_tr = pd.get_dummies(train_data['ShelveLoc'], dtype=int, prefix_sep='_', prefix='ShelveLoc')
train_data = train_data.drop('ShelveLoc', axis=1)
train_data = train_data.join(onehot_tr)


train_data_Y = train_data.iloc[:, 0]
train_data_X = train_data.drop('Sales', axis=1)

Then implemented the algorithm like this:

learning_rate = 0.01
epoch_num = 50
initial_w = 0.1
intercept = 0.1
w_matrix = np.ones((12, 1)) * initial_w

for e in range(epoch_num):
    for i in range(len(train_data_X)):

        x_i = train_data_X.iloc[i].to_numpy()
        y_i = train_data_Y.iloc[i]
        
        y_estimated = np.dot(x_i, w_matrix) + intercept
        
        grad_w = x_i.reshape(-1, 1) * (y_i - y_estimated)
    
        grad_intercept = (y_i - y_estimated)
        
       
        w_matrix = w_matrix - 2 * learning_rate * grad_w
        intercept = intercept - 2 * learning_rate * grad_intercept
        
        

print("Final weights:\n", w_matrix)
print("Final intercept:", intercept)

But the output was

Final weights:
 [[nan]
 [nan]
 [nan]
 [nan]
 [nan]
 [nan]
 [nan]
 [nan]
 [nan]
 [nan]
 [nan]
 [nan]]
Final intercept: [nan]

I run it with various learning rates and I also tried convergence threshold but still got the same result.. I couldn't find out why my code gives me nans..

Can anybody see the issue?

Original Q&A

There are 1 answers

**Baradrist** · Accepted Answer · 2023-11-28T13:28:20+00:00

You get an overflow of numbers in your code. The gradients basically get too large with your setting. Consider taking more epochs and a much lower learning rate (aka. "step-size") to make your algorithm converge. I was able to get results with a learning rate of 0.000001, but you will have to see for your training set what the "correct" number could be and also monitor the convergence (depending on the number of epochs). You could also consider an adaptive learning rate schedule.

On another note: I am not exactly sure that your equations are correct. Since you use (y_i - y_estimated) and not the other way around, it might be that you need to update your weights and intercept with + (a "double minus", if you will). Maybe you can check that again. (For comparison: here or here)

PS: Your algorithm is not yet "stochastic". ;D

TechQA.

unexpected output with stochastic gradient descent algorithm for linear regression

There are 1 answers

Related Questions in PYTHON

Related Questions in MACHINE-LEARNING

Related Questions in LINEAR-REGRESSION

Related Questions in GRADIENT-DESCENT

Related Questions in STOCHASTIC-GRADIENT

Popular Questions

Popular Tags

Trending Questions