I'm trying to implement this LSTM Architecture from the paper "Dropout improves Recurrent Neural Networks for Handwriting Recognition":
In the paper, the researchers defined Multidirectional LSTM Layers as "Four LSTM layers applied in parallel, each with a particular scanning direction"
Here's how (I think) the network looks like in Keras:
from keras.layers import LSTM, Dropout, Input, Convolution2D, Merge, Dense, Activation, TimeDistributed
from keras.models import Sequential
def build_lstm_dropout(inputdim, outputdim, return_sequences=True, activation='tanh'):
net_input = Input(shape=(None, inputdim))
model = Sequential()
lstm = LSTM(output_dim=outputdim, return_sequences=return_sequences, activation=activation)(net_input)
model.add(lstm)
model.add(Dropout(0.5))
return model
def build_conv(nb_filter, nb_row, nb_col, net_input, border_mode='relu'):
return TimeDistributed(Convolution2D( nb_filter, nb_row, nb_col, border_mode=border_mode, activation='relu')(net_input))
def build_lstm_conv(lstm, conv):
model = Sequential()
model.add(lstm)
model.add(conv)
return model
def build_merged_lstm_conv_layer(lstm_conv, mode='concat'):
return Merge([lstm_conv, lstm_conv, lstm_conv, lstm_conv], mode=mode)
def build_model(feature_dim, loss='ctc_cost_for_train', optimizer='Adadelta'):
net_input = Input(shape=(1, feature_dim, None))
lstm = build_lstm_dropout(2, 6)
conv = build_conv(64, 2, 4, net_input)
lstm_conv = build_lstm_conv(lstm, conv)
first_layer = build_merged_lstm_conv_layer(lstm_conv)
lstm = build_lstm_dropout(10, 20)
conv = build_conv(128, 2, 4, net_input)
lstm_conv = build_lstm_conv(lstm, conv)
second_layer = build_merged_lstm_conv_layer(lstm_conv)
lstm = build_lstm_dropout(50, 1)
fully_connected = Dense(1, activation='sigmoid')
lstm_fc = Sequential()
lstm_fc.add(lstm)
lstm_fc.add(fully_connected)
third_layer = Merge([lstm_fc, lstm_fc, lstm_fc, lstm_fc], mode='concat')
final_model = Sequential()
final_model.add(first_layer)
final_model.add(Activation('tanh'))
final_model.add(second_layer)
final_model.add(Activation('tanh'))
final_model.add(third_layer)
final_model.compile(loss=loss, optimizer=optimizer, sample_weight_mode='temporal')
return final_model
And here are my questions:
- If my implementation of the architecture is correct, how do you implement the scanning directions for the four LSTM layers?
- If my implementation is not correct, is it possible to implement such an architecture in Keras? If not, are there any other frameworks that can help me in implementing such an architecture?
You can check this for the implementation of bidirectional LSTM. Basically, you just set
go_backwards=True
for the backward-LSTM.However, in your case, you have to write a "mirror"+reshape layer to reverse the rows. A mirror layer can look like (I am using lambda layer here for convenience) :
Lambda(lambda x: x[:,::-1,:])