I'm a Julia user new to Flux and machine learning. As a first test and to understand how Flux works, I tried using Flux to estimate a simple linear regression model. But clearly I'm doing something wrong, as training the model using train! does not give me the expected OLS coefficients. This surprised me; since linear regression is a simple convex optimization problem, I would have expected gradient descent to quickly converge to the optimum. So I assume I've misunderstood something about how train! works.
Here's my code:
using Flux
using Flux: @epochs
using GLM
# Load data: The features of the Iris data set
features = Flux.Data.Iris.features();
x = features[1:3,:];
y = features[4,:];
J, N = size(x); # number of explanatory variables, number of observations
model = Chain(Dense(J,1)); # define the model
loss(x,y) = Flux.Losses.mse(model(x),y); # define the loss function
function loss_all(X,y) # and define a full-sample loss function
l = 0;
for i in 1:length(y)
l += loss(X[:,i],y[i]);
end
return l
end
loss_all(x,y)
@epochs 10000 Flux.train!(loss, params(model), [(x,y)], Descent(0.01)); # train the model
loss_all(x,y)
# How does the result compare to OLS (should be exactly the same)?
x_augmented = vcat(ones(1,N),x);
ols = inv(x_augmented*transpose(x_augmented))*x_augmented*y
y_hat = transpose(x_augmented)*ols;
sse = sum((y_hat - y).^2)
I expect I'm making a silly mistake, but I would really appreciate if someone could help me identify the problem.
The easiest way to fix it is to make sure
y
has a shape matchingmodel(x)
:Note that:
expands to:
so
model(x)
andy
should have the same shape (after my fix they are(1,150)
). In your original code it was(1,150)
vs(150,)
which meant that the dimensions got broadcasted to(150,150)
after(model(x) .- y).^2
.