I am new to machine learning in python and I am trying to debug an error that i am getting when it comes to manipulating the sizes and shapes of layers for a simple DQN model that guesses letters in a game.
states =
def build_model():
model = Sequential()
model.add(Dense(30, activation='relu', input_shape=(26,41)))
model.add(Dense(30, activation='relu'))
model.add(Dense(26, activation='linear'))
return model
model = build_model()
model.summary()
The code above gives the following model
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (None, 26, 30) 1260
dense_2 (Dense) (None, 26, 30) 930
dense_3 (Dense) (None, 26, 26) 806
=================================================================
Total params: 2,996
Trainable params: 2,996
Non-trainable params: 0
_________________________________________________________________
However running the model gives the following error in the terminal:
Model output "Tensor("dense_61/BiasAdd:0", shape=(None, 26, 26), dtype=float32)" has invalid shape. DQN expects a model that has one dimension for each action, in this case 26.
I see that my final dense layer is outputing (None, 26, 26), which is causing the error. Using keras's reshape layer, I can change the layer to (None, 780, 1), which is still not what I'm looking for. My question is why are the shapes of the output by dense layers multidimensional as opposed to 1D if dense layers have no structure, and how can the output shape be manipulated to return a 1D vector?
Dense implements the operation:
output = activation(dot(input, kernel) + bias)
if input has dimensions (batch_size, d0, d1), then we create a kernel with shape (d1, units), and the kernel operates along axis 2 of the input, on every sub-tensor of shape (1, 1, d1) (there are batch_size * d0 such sub-tensors). The output in this case will have shape (batch_size, d0, units).
In the above code, None will be replaced by batch size when you pass the actual data. Summary gives you an overall picture without actually passing any data.
For example, if you have 100 images of 26x24 images, your input shape would be (100, 26, 24). You have to preprocess data so that it can be finally fed into the built model.
I am assuming you are using tensorflow, so you can use tf.reshape() to reshape the outputs of the model. Numpy also has a function to reshape numpy.reshape()