Theano stochastic gradient descent NaN output

1.2k views Asked by At

I am using Theano stochastic gradient descent for solving a minimization problem. When running my code, the first iterations seem to work, but after a while and all of a sudden, the optimized parameter (eta) becomes NaNs (as well as the derivatives g_eta). It seems to be a Theano technical issue more than a bug in my code since I have checked it in several different ways.

Anyone has an idea of which could be the reason? My code is the following:

n_exp = 4
features = theano.shared(value=X_comb_I, name='features', borrow=True)

x = T.dmatrix()
y = T.ivector()

srng = RandomStreams()
rv_u = srng.uniform((64,n_exp))


eta = theano.shared(value=rv_u.eval(), name='eta', borrow=True)

ndotx = T.exp(T.dot(features, eta))
g = ndotx/T.reshape(T.repeat( T.sum(ndotx, axis=1), (n_exp), axis=0),[n_i,n_exp])
my_score_given_eta = T.sum((g*x),axis=1)

cost = T.mean(T.abs_(my_score_given_eta - y))

g_eta = T.grad(cost=cost, wrt=eta)

learning_rate = 0.5

updates = [(eta, eta - learning_rate * g_eta)]

train_set_x = theano.shared(value=score, name='train_set_x', borrow=True)
train_set_y = theano.shared(value=labels.astype(np.int32), name='train_set_y', borrow=True)

train = theano.function(inputs=[],
                 outputs=cost,
                 updates=updates, givens={x: train_set_x, y: train_set_y})

validate = theano.function(inputs=[],
                outputs=cost, givens={x: train_set_x, y: train_set_y})

train_monitor = []
val_monitor = []

n_epochs = 1000

for epoch in range(n_epochs):
    loss = train()
    train_monitor.append(validate())

    if epoch%2 == 0:
        print "Iteration: ", epoch
        print "Training error, validation error: ", train_monitor[-1] #,  val_monitor[-1]

Thank you!

1

There are 1 answers

0
Alexander McFarlane On

The fact that you are getting the same problem but more slowly with a slow learning rate suggests that you possible have an instability in your function which blows up near where you start SGD.

  1. Try different starting values
  2. Adjust your cost function to penalise the nasty area that is blowing up
  3. Try a different gradient descent method