Increasing and wide spreading cost function with Stochastic Gradient Descent

207 views Asked by hallo02 At 20 December 2016 at 15:26

I am using Tensorflow in an online learning environment. As cost function is implemented:

cost = tf.sqrt(tf.reduce_mean(tf.square(tf.sub(Y, output))))

Optimization is done like:

train_op = tf.train
            .GradientDescentOptimizer(0.0001)
            .minimize(cost,name="GradientDescent")

And I run Stochastic Gradient Descent like:

m, i = sess.run([merged, train_op], feed_dict={X: input_batch,Y:label_batch})

Whereby input_batch and label_batch contain only one vector each.

So how can I interpret a cost function like:

Is this a good progress for a stochastic approach? Why the gap gets bigger?

I train the network 50'000 times with the same 50 training examples. So each example is used for training 10'000 times every 51th step.

I tried already to change the learning rate by factor 10 in both ways. This question is related to my other question from: Does Stochastic Gradient Descent even work with TensorFlow?

Thanks for any hints.

TechQA.