I am using Theano stochastic gradient descent for solving a minimization problem. When running my code, the first iterations seem to work, but after a while and all of a sudden, the optimized parameter (eta) becomes NaNs (as well as the derivatives g_eta). It seems to be a Theano technical issue more than a bug in my code since I have checked it in several different ways.
Anyone has an idea of which could be the reason? My code is the following:
n_exp = 4
features = theano.shared(value=X_comb_I, name='features', borrow=True)
x = T.dmatrix()
y = T.ivector()
srng = RandomStreams()
rv_u = srng.uniform((64,n_exp))
eta = theano.shared(value=rv_u.eval(), name='eta', borrow=True)
ndotx = T.exp(T.dot(features, eta))
g = ndotx/T.reshape(T.repeat( T.sum(ndotx, axis=1), (n_exp), axis=0),[n_i,n_exp])
my_score_given_eta = T.sum((g*x),axis=1)
cost = T.mean(T.abs_(my_score_given_eta - y))
g_eta = T.grad(cost=cost, wrt=eta)
learning_rate = 0.5
updates = [(eta, eta - learning_rate * g_eta)]
train_set_x = theano.shared(value=score, name='train_set_x', borrow=True)
train_set_y = theano.shared(value=labels.astype(np.int32), name='train_set_y', borrow=True)
train = theano.function(inputs=[],
outputs=cost,
updates=updates, givens={x: train_set_x, y: train_set_y})
validate = theano.function(inputs=[],
outputs=cost, givens={x: train_set_x, y: train_set_y})
train_monitor = []
val_monitor = []
n_epochs = 1000
for epoch in range(n_epochs):
loss = train()
train_monitor.append(validate())
if epoch%2 == 0:
print "Iteration: ", epoch
print "Training error, validation error: ", train_monitor[-1] #, val_monitor[-1]
Thank you!
The fact that you are getting the same problem but more slowly with a slow learning rate suggests that you possible have an instability in your function which blows up near where you start SGD.