Is Levenberg–Marquardt a type of Backpropagation algorithm or is it a different category of algorithm?
Wkipedia says that it is a curve fitting algorithm. How is a curve fitting algorithm relevant to a Neural Net?
Is Levenberg–Marquardt a type of Backpropagation algorithm or is it a different category of algorithm?
Wkipedia says that it is a curve fitting algorithm. How is a curve fitting algorithm relevant to a Neural Net?
In terms of neural networks, gradient descent is called backpropagation. Gradient descent is a method to find a set of parameters to minimize a function. If that function is the mean squared error, it's a regression and hence, a curve fitting problem. This is because one wants to find a set of parameters that minimizes the squared difference between the output of a function and some training data.
Think of the training data as some points in a space and you want to learn a function that minimizes the squared difference between this points. From the german wikipedia page for Least-Squares, you'll see a graphical explaination. The blue dots are yout training data and the red curve is the function you want to learn.
A neural network is in general a function approximator. The Universal Approximation Theorem says that a feed-forward net with one hidden layer and a finite amount of neurons able to approximate any continous function. You can think of a neural net as a template for any continous function where you just have to find a set of weights to approximate a function for a given objective. In this case, the objective is to minimize the squared difference between some data points and the function. The objective in this case can be described as the follwing equation:
Here, f(x_i, beta)
is the neural network, beta
the parameter set and y_i
the data points. argmin beta
means, that we want to find a set of parameters that minimizes the output of the objective function. Actually, the set with the minimum error between the expected output of the network f(x_i, beta)
and y_i
for all data points.
So, how to achive the objective? Usually, it's done by backpropagation or gradient descent to be more precise. The equation for this is the following:
Here, we update a parameter beta
at the time n+1
by it's value from the previous timestep n
minus the learningrate gamma
times the gradient of f(x, beta)
. It's called backpropagation because in neural networks, the parameters are depending on one another.
The Levenberg-Marquard algorithm (LVM) is a combination of the gradient descent algorithm and the Gauss-Newton-Method with a slightly more complicated equation in matrix notation:
Here, J
is the Jacobian, I
the identity matrix and lambda
a dumping factor. When lambda
is small, the algorithm is similar to the Gauss-Newton-Method and with a high value of lambda
, it's similar to gradient descent.
So, you can think of this as another algorithm to solve least-squares problem which can also be employed in neural networks. Rumelhart et al. used the term "backpropagation" by emplyoing the gradient descent method in their original paper Learning representations by back-propagating errors
backpropagation is the idea of calculating local error gradients for each weight. This idea is implemented in a variety of ways, the standard backpropagation algorithm is in fact gradient descent, LM uses the idea of backpropagation in the calculation of the Jacobian.
LM converge quicker than backpropagation gradient descent. In fact, there are other algorithms quasi-Newtonian that also use backpropagation for calculating their Jacobians an Hessians, even the Kalman filter could use it.