I try to create DQN neural network. I have convolution neural network. Input has shape: None x WIDTH x HEIGHT x FRAME_COUNT. FULLY_CONNECTED_SIZE constant calculated such that output has shape [3] for input with shape 1 x WIDTH x HEIGHT x FRAME_COUNT.
FULLY_CONNECTED_SIZE = (WIDTH / 8) * (HEIGHT / 8) * 32
def createNetwork(self):
conv_layer_1_biases = tf.Variable(tf.constant(np.random.uniform(-1,1), shape=[16]))
conv_layer_1_weights = tf.Variable(tf.constant(np.random.uniform(-1,1), shape=[8,8,FRAME_COUNT,16]))
input_layer = tf.placeholder("float", [None,WIDTH,HEIGHT,FRAME_COUNT])
conv_layer_1 = tf.nn.relu(tf.nn.conv2d(input_layer, strides=[1,4,4,1], filter=conv_layer_1_weights, padding = 'SAME') + conv_layer_1_biases)
conv_layer_2_biases = tf.Variable(tf.constant(np.random.uniform(-1,1), shape=[32]))
conv_layer_2_weights = tf.Variable(tf.constant(np.random.uniform(-1,1), shape=[4, 4, 16, 32]))
conv_layer_2 = tf.nn.relu(tf.nn.conv2d(conv_layer_1, strides=[1,2,2,1],filter=conv_layer_2_weights, padding = 'SAME') + conv_layer_2_biases)
reshaped_layer = tf.reshape(conv_layer_2,[-1,FULLY_CONNECTED_SIZE])
fully_connected_layer_weights = tf.Variable(tf.constant(np.random.uniform(-1,1), shape=[FULLY_CONNECTED_SIZE,256]))
fully_connected_layer_biases = tf.Variable(tf.constant(np.random.uniform(-1,1), shape=[256]))
fully_connected_layer = tf.nn.relu(tf.matmul(reshaped_layer,fully_connected_layer_weights) + fully_connected_layer_biases)
output_layer_weights = tf.Variable(tf.constant(np.random.uniform(-1,1), shape=[256,NUMBER_OF_ACTIONS]))
output_layer_biases = tf.Variable(tf.constant(np.random.uniform(-1,1), shape=[NUMBER_OF_ACTIONS]))
output_layer = tf.matmul(fully_connected_layer,output_layer_weights) + output_layer_biases
return input_layer, output_layer
And I train it like this:
self.inputQ, self.outputQ = self.createNetwork()
self._session = tf.Session()
self._action = tf.placeholder("float", [None, NUMBER_OF_ACTIONS])
self._target = tf.placeholder("float", [None])
readout_action = tf.reduce_sum(tf.mul(self.outputQ, self._action), reduction_indices=1)
cost = tf.reduce_mean(tf.square(self._target - readout_action))
self._train_operation = tf.train.GradientDescentOptimizer(1e-4).minimize(cost)
self._session.run(tf.initialize_all_variables())
...
self._session.run(self._train_operation,feed_dict={self._target:targets,self._action:actions,self.inputQ:before_states})
Before_states represent an N-array of FRAME_COUNT images as arrays with size WIDTH x HEIGHT where each element 1 or 0 : 1 mean white pixel and 0 - black pixel, so total shape is NxWIDTHxHEIGHTxFRAME_COUNT Also I have Q-function :
def Q(self, states):
return self._session.run(self.outputQ, feed_dict={self.inputQ: states})
My problem:
At first time Q([state])
differs for each state
where state is FRAME_COUNT images with size WIDTH x HEIGHT, so neural network with input 1xWIDTHxHEIGHTxFRAME_COUNT works as expected.
After first training, same value Q([state]) = Q1
for each possible state
After second training, same value Q([state]) = Q2
for each possible state
.
After n-th training, same value Q([state]) = Qn
for each possible state
.
Why is that happening? Output of neural network should be different for each input state. What I should do in that situation? I tried different learnings rate, optimization methods (Descendent Gradient, Adam), initial weights.