I am trying to train an LSTM with keras using TensorFlow backend on toy data and am getting this error:

ValueError: Error when checking target: expected dense_39 to have 2 dimensions, but got array with shape (996, 1, 1)

The error occurs immediately upon calling model.fit; nothing seems to run. It seems to me that Keras is checking dimensions, but ignoring the fact that it should be taking batches of my target with each batch of my input. The error shows the full dimension of my target array, which implies to me that it's never split into batches by Keras, at least while checking dimensions. For the life of me I can't figure out why this would be or anything else that might help.

My network definition with expected layer output shapes in comments:

batch_shape = (8, 5, 1)
x_in = Input(batch_shape=batch_shape, name='input')  # (8, 5, 1)
seq1 = LSTM(8, return_sequences=True, stateful=True)(x_in)  # (8, 5, 8)
dense1 = TimeDistributed(Dense(8))(seq1)  # (8, 5, 8)
seq2 = LSTM(8, return_sequences=False, stateful=True)(dense1)  # (8, 8)
dense2 = Dense(8)(seq2)  # (8, 8)
out = Dense(1)(dense2)  # (8, 1)

model = Model(inputs=x_in, outputs=out)
optimizer = Nadam()
model.compile(optimizer=optimizer, loss='mean_squared_error')
model.summary()

The model summary, shapes as expected:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input (InputLayer)           (8, 5, 1)                 0         
_________________________________________________________________
lstm_28 (LSTM)               (8, 5, 8)                 320       
_________________________________________________________________
time_distributed_18 (TimeDis (8, 5, 8)                 72        
_________________________________________________________________
lstm_29 (LSTM)               (8, 8)                    544       
_________________________________________________________________
dense_38 (Dense)             (8, 8)                    72        
_________________________________________________________________
dense_39 (Dense)             (8, 1)                    9         
=================================================================
Total params: 1,017
Trainable params: 1,017
Non-trainable params: 0
_________________________________________________________________

My toy data, where the target is just a line decreasing from 100 to 0, and the input is just an array of zeros. I want to do one-step-ahead prediction, so I create rolling windows of my input and target using a rolling_window() method defined below:

target = np.linspace(100, 0, num=1000)
target_rolling = rolling_window(target[4:], 1)[:, :, None]
target_rolling.shape  # (996, 1, 1)  <-- this seems to be the array that's causing the error
x_train = np.zeros((1000,))
x_train_rolling = rolling_window(x_train, 5)[:, :, None]
x_train_rolling.shape  # (996, 5, 1)

The rolling_window() method:

def rolling_window(arr, window):
    shape = arr.shape[:-1] + (arr.shape[-1] - window + 1, window)
    strides = arr.strides + (arr.strides[-1],)
    return np.lib.stride_tricks.as_strided(arr, shape=shape, strides=strides)

And my training loop:

reset_state = LambdaCallback(on_epoch_end=lambda _, _: model.reset_states())
callbacks = [reset_state]
history = model.fit(x_train_rolling, y_train_rolling,
                    batch_size=8,
                    epochs=100,
                    validation_split=0.,
                    callbacks=callbacks)

I have tried:

  • Non-stateful LSTM, but I really need stateful for the eventual application. Same error.
  • return_sequence=True in the second LSTM with a Flatten layer after. Same error.
  • return_sequence=True without a Flatten layer. This gives a different error because it is expecting a target with the same shape as the output, which at that point is (batch_size, 5, 1) and not (batch_size, 1, 1).
  • Running the same architecture on the whole sequence at once (batch size of 1), without rolling windows. This works, but just learns to approximate the mean of my target and is useless for my purposes.

Note that none of these questions seem to directly answer mine, although I was really hopeful on a couple:

1 Answers

1
Tom On Best Solutions

Posting the solution I wrote in the comments: Since there is an extra dimension, the "-1" makes the dimension self adjust to what ever number it has to be to fit the other dimensions. Since only two dimensions are give, "(-1,1)" would make it "(996, 1)".

target_rolling = target_rolling.reshape(-1,1)

before

at target_rolling.shape # (996, 1, 1)