I'm trying to add attention layer to Seq2Seq model, but I got the InvalidArgumentError on the concatenation step when fitting on the train set. The error is from the concat step where it's concat decoder output and attention output.
The error mentioned:
Dimensions of inputs should match: shape[0] = [32,15,300] vs. shape[1] = [32,32,300]
My understanding the first 32 item is the batch size, second is the sequence length, and 300 is number of units. But why the shape[1] has also 32 for the second item?
Below is my code, any insights would be very helpful.
WORD2VEC_DIMS = 50
DICTIONARY_SIZE = num_tokens
units = 300
ADAM = Adam(lr=0.00005)
MAX_LEN = 15 
drop_out_rate= 0.2
encoder_inputs_att = Input(shape=( MAX_LEN , ))
encoder_embedding_att = embedding_layer_encoder(encoder_inputs_att)
encoder_embedding_att=layers.SpatialDropout1D(drop_out_rate)(encoder_embedding_att)
encoder_outputs_att , state_h_att , state_c_att = LSTM( units , return_state=True )( encoder_embedding_att )
encoder_states_att = [ state_h_att , state_c_att ]
decoder_inputs_att = Input(shape=( MAX_LEN ,  ))
decoder_embedding_att = embedding_layer_decoder(decoder_inputs_att)
decoder_lstm_att = LSTM( units , return_state=True , return_sequences=True )
decoder_outputs_att , state_dec_h_att , state_dec_c_att = decoder_lstm_att ( decoder_embedding_att , initial_state=encoder_states_att )
# add attention
attn_layer_att = Attention(name='attention_layer', causal = True)
attn_out_att = attn_layer_att([encoder_outputs_att, decoder_outputs_att])
#decoder_outputs_att = tf.keras.layers.GlobalAveragePooling1D()(decoder_outputs_att)
#attn_out_att = tf.keras.layers.GlobalAveragePooling1D()(attn_out_att)
decoder_concat_input_att = Concatenate(axis=-1, name='concat_layer')([decoder_outputs_att, attn_out_att])
decoder_dense_att = Dense( DICTIONARY_SIZE , activation="softmax" ) 
# add time distributed
dense_time_att = TimeDistributed(decoder_dense_att, name='time_distributed_layer')
output_att = dense_time_att ( decoder_concat_input_att )
#output = tf.cast(tf.keras.backend.argmax(output), tf.float64)
output_att = tf.cast(output_att,tf.float64)
model_att = tf.keras.models.Model([encoder_inputs_att, decoder_inputs_att], output_att )
model_att.compile(optimizer=ADAM, loss='sparse_categorical_crossentropy')
model_att.summary()
model_att.fit([x_train, y_train], y_train_decoded, batch_size = 32, epochs = 50, validation_split=0.1, shuffle=True)
 
                        
You provided the arguments to the attention in the opposite order. It should be:
From the
tf.keras.layers.Attentiondocumentation:inputs: List of the following tensors:Tensorof shape[batch_size, Tq, dim].Tensorof shape[batch_size, Tv, dim].Tensorof shape[batch_size, Tv, dim]. If not given, will use value for both key and value, which is the most common case.In case of the seq2seq model, you can imagine the attention as a probabilistic retrieval of information from the encoder by the decoder. In every decoding step, the decoder collects what is relevant from the encoder. Therefore, the decoder states are used as the queries and the encoder states are the retrieved values.