Concatenate layer shape error in sequence2sequence model with Keras attention

1k views Asked by At

I'm trying to implement a simple word-level sequence-to-sequence model with Keras in Colab. I'm using the Keras Attention layer. Here is the definition of the model:

embedding_size=200
UNITS=128

encoder_inputs = Input(shape=(None,), name="encoder_inputs")

encoder_embs=Embedding(num_encoder_tokens, embedding_size, name="encoder_embs")(encoder_inputs)

#encoder lstm
encoder = LSTM(UNITS, return_state=True, name="encoder_LSTM") #(encoder_embs)
encoder_outputs, state_h, state_c = encoder(encoder_embs)

encoder_states = [state_h, state_c]

decoder_inputs = Input(shape=(None,), name="decoder_inputs")
decoder_embs = Embedding(num_decoder_tokens, embedding_size, name="decoder_embs")(decoder_inputs)

#decoder lstm
decoder_lstm = LSTM(UNITS, return_sequences=True, return_state=True, name="decoder_LSTM")
decoder_outputs, _, _ = decoder_lstm(decoder_embs, initial_state=encoder_states)

attention=Attention(name="attention_layer")
attention_out=attention([encoder_outputs, decoder_outputs])

decoder_concatenate=Concatenate(axis=-1, name="concat_layer")([decoder_outputs, attention_out])
decoder_outputs = TimeDistributed(Dense(units=num_decoder_tokens, 
                                  activation='softmax', name="decoder_denseoutput"))(decoder_concatenate)

model=Model([encoder_inputs, decoder_inputs], decoder_outputs, name="s2s_model")
model.compile(optimizer='RMSprop', loss='categorical_crossentropy', metrics=['accuracy'])

model.summary()

Model compiling is fine, no problems whatsoever. The encoder and decoder input and output shapes are:

Encoder training input shape:  (4000, 21)
Decoder training input shape:  (4000, 12)
Decoder training target shape:  (4000, 12, 3106)
--
Encoder test input shape:  (385, 21)

This is the model.fit code:

model.fit([encoder_training_input, decoder_training_input], decoder_training_target,
      epochs=100,
      batch_size=32,
      validation_split=0.2,)

When I run the fit phase, I get this error from the Concatenate layer:

ValueError: Dimension 1 in both shapes must be equal, but are 12 and 32. 
Shapes are [32,12] and [32,32]. for '{{node s2s_model/concat_layer/concat}} = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32](s2s_model/decoder_LSTM/PartitionedCall:1,
s2s_model/attention_layer/MatMul_1, s2s_model/concat_layer/concat/axis)' with input shapes: [32,12,128], [32,32,128], [] and with computed input tensors: input[2] = <2>.

So, the first 32 are batch_size, 128 are output shape from decoder_outputs and attention_out, 12 is the number of tokens of decoder inputs. I can't understand how to solve this error, I can't change the number of input tokens I think, any suggestions for me?

2

There are 2 answers

0
Gianni Pinotti On BEST ANSWER

Solved this thanks to @Majitsima. I swapped the inputs to the Attention layer, so instead of

attention=Attention(name="attention_layer")
attention_out=attention([encoder_outputs, decoder_outputs])

the input is

attention=Attention(name="attention_layer")
attention_out=attention([decoder_outputs, encoder_outputs])

with

decoder_concatenate=Concatenate(axis=-1, name="concat_layer")([decoder_outputs, attention_out])

Everything seems to work now, so thank you again @Majitsima, hope this can help!

9
Majitsima On

Replace axis=-1 with axis=1 in the concatenation layer. The example in this documentation should clarify why.

Your problem resides in the inputs passed to the concatenation. You need to specify the right axis to concatenate two differently shaped matrices or tensors as they are called in Tensorflow. The shapes [32, 12, 128] and [32, 32, 128] differ in the second dimension referenced by passing 1 (because dimensions start from 0 upwards). This would result in a shape [32, (12+32), 128], increasing the elements in the second dimension.

When you specify axis as -1 (default value), your concatenation layer basically flattens the input before use, which in your case does not work due to the difference in dimensions.