Problem with gradient descent least squares code

57 views Asked by At

I'm trying to use gradient descent on a data set. What I have written is

import numpy
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

data = pd.read_csv('C:/Users/Teacher/Downloads/data.csv')
X = data.iloc[:, 0]  # selects all data from first column in data
Y = data.iloc[:, 1]
plt.scatter(X,Y)
plt.show()
n = len(X)

a = 0
b = 0
L = .001

for i in range(1000):
    y_predicted = a * X + b
    pd_a = (1 / n) * sum((y_predicted - Y) * X)
    pd_b = (1 / n) * sum(y_predicted - Y)
    a = a - L * pd_a
    b = b - L * pd_b
print(a, b)
plt.scatter(X, Y)
c, d = numpy.polyfit(X, Y, 1)
print(c, d)
plt.plot([min(X), max(X)], [a * x + b for x in [min(X), max(X)]], [c * x + d for x in [min(X), max(X)]])
plt.show()

If I instead define X and Y = np.random.rand(20), then everything seems to work fine, so I the issue appears to be with the iput from csv. However, the scatterplot for X and Y is still fine, even when I define them as the first and second column of my data set, so I'm not sure what's going on.

Edit: Here is an image of the scatterplot after defining X = data.iloc[:, 0] Y = data.iloc[:, 1]

enter image description here

Here is an image of the plot and line at the end of the code.

enter image description here

The result of print(data.head()):

enter image description here

Edit: reading just one line of the csv:

enter image description here

enter image description here

1

There are 1 answers

9
user23463397 On

Since I don't have the csv, I would do the below to troubleshoot why reading from the csv does not work;

Assumption: there are 2 rows per line in csv, so we create X and Y as lists using the below for loop

data = pd.read_csv('C:/Users/Teacher/Downloads/data.csv')
X, Y = [], []
for i in data:
  X.append(i.split()[0])
  Y.append(i.split()[1])