Theano stochastic gradient descent NaN output

Question

Theano stochastic gradient descent NaN output

1.2k views Asked by Isadora Nun At 11 June 2015 at 16:00

I am using Theano stochastic gradient descent for solving a minimization problem. When running my code, the first iterations seem to work, but after a while and all of a sudden, the optimized parameter (eta) becomes NaNs (as well as the derivatives g_eta). It seems to be a Theano technical issue more than a bug in my code since I have checked it in several different ways.

Anyone has an idea of which could be the reason? My code is the following:

n_exp = 4
features = theano.shared(value=X_comb_I, name='features', borrow=True)

x = T.dmatrix()
y = T.ivector()

srng = RandomStreams()
rv_u = srng.uniform((64,n_exp))


eta = theano.shared(value=rv_u.eval(), name='eta', borrow=True)

ndotx = T.exp(T.dot(features, eta))
g = ndotx/T.reshape(T.repeat( T.sum(ndotx, axis=1), (n_exp), axis=0),[n_i,n_exp])
my_score_given_eta = T.sum((g*x),axis=1)

cost = T.mean(T.abs_(my_score_given_eta - y))

g_eta = T.grad(cost=cost, wrt=eta)

learning_rate = 0.5

updates = [(eta, eta - learning_rate * g_eta)]

train_set_x = theano.shared(value=score, name='train_set_x', borrow=True)
train_set_y = theano.shared(value=labels.astype(np.int32), name='train_set_y', borrow=True)

train = theano.function(inputs=[],
                 outputs=cost,
                 updates=updates, givens={x: train_set_x, y: train_set_y})

validate = theano.function(inputs=[],
                outputs=cost, givens={x: train_set_x, y: train_set_y})

train_monitor = []
val_monitor = []

n_epochs = 1000

for epoch in range(n_epochs):
    loss = train()
    train_monitor.append(validate())

    if epoch%2 == 0:
        print "Iteration: ", epoch
        print "Training error, validation error: ", train_monitor[-1] #,  val_monitor[-1]

Thank you!

Original Q&A

There are 1 answers

**Alexander McFarlane** · Answer 1 · 2015-06-11T23:21:56+00:00

The fact that you are getting the same problem but more slowly with a slow learning rate suggests that you possible have an instability in your function which blows up near where you start SGD.

Try different starting values
Adjust your cost function to penalise the nasty area that is blowing up
Try a different gradient descent method

TechQA.

Theano stochastic gradient descent NaN output

There are 1 answers

Related Questions in PYTHON

Related Questions in NAN

Related Questions in THEANO

Related Questions in GRADIENT-DESCENT

Popular Questions

Popular Tags

Trending Questions