Slow Adagrad Convergence

743 views Asked by Caleb Everett At 19 December 2016 at 09:31

I'm working on a comparison of popular gradient descent algorithms in Python. Here is a link to the notebook I've got going.

The Adagrad algorithm converges at a much slower rate than the plain vanilla batch, stochastic and mini-batch algorithms. I was expecting it to be an improvement from the basic methods. Is the difference attributable to one or more of the factors below or something else, or is this the expected result?

The test data set is small and Adagrad performs relatively better on larger data sets
Something having to do with the characteristics of the sample data
Something having to do with the parameters
An error in the code

Here is the code for Adagrad - it is also the last one in the notebook.

def gd_adagrad(data, alpha, num_iter, b=1):
    m, N = data.shape
    Xy = np.ones((m,N+1))
    Xy[:,1:] = data
    theta = np.ones(N)
    grad_hist = 0
    for i in range(num_iter):
        np.random.shuffle(Xy)
        batches = np.split(Xy, np.arange(b, m, b))
        for B_x, B_y in ((B[:,:-1],B[:,-1]) for B in batches):
            loss_B = B_x.dot(theta) - B_y
            gradient = B_x.T.dot(loss_B) / B_x.shape[0]
            grad_hist += np.square(gradient)
            theta = theta - alpha * gradient / (10**-6 + np.sqrt(grad_hist))
    return theta

theta = gd_adagrad(data_norm, alpha*10, 150, 50)

Original Q&A

TechQA.

Slow Adagrad Convergence

There are 0 answers

Related Questions in PYTHON

Related Questions in MACHINE-LEARNING

Related Questions in GRADIENT-DESCENT

Popular Questions

Popular Tags

Trending Questions