How to Implement "Multidirectional" LSTMs?

1.1k views Asked by At

I'm trying to implement this LSTM Architecture from the paper "Dropout improves Recurrent Neural Networks for Handwriting Recognition": Architecture from the paper Dropout improves Recurrent Neural Networks for Handwriting Recognition

In the paper, the researchers defined Multidirectional LSTM Layers as "Four LSTM layers applied in parallel, each with a particular scanning direction"

Here's how (I think) the network looks like in Keras:

from keras.layers import LSTM, Dropout, Input, Convolution2D, Merge, Dense, Activation, TimeDistributed
from keras.models import Sequential

def build_lstm_dropout(inputdim, outputdim, return_sequences=True, activation='tanh'):
    net_input = Input(shape=(None, inputdim))
    model = Sequential()
    lstm  = LSTM(output_dim=outputdim, return_sequences=return_sequences, activation=activation)(net_input)
    model.add(lstm)
    model.add(Dropout(0.5))
    return model

def build_conv(nb_filter, nb_row, nb_col, net_input, border_mode='relu'):
    return TimeDistributed(Convolution2D( nb_filter, nb_row, nb_col, border_mode=border_mode, activation='relu')(net_input))

def build_lstm_conv(lstm, conv):
    model = Sequential()
    model.add(lstm)
    model.add(conv)
    return model

def build_merged_lstm_conv_layer(lstm_conv, mode='concat'):
    return Merge([lstm_conv, lstm_conv, lstm_conv, lstm_conv], mode=mode)

def build_model(feature_dim, loss='ctc_cost_for_train', optimizer='Adadelta'):
    net_input = Input(shape=(1, feature_dim, None))

    lstm = build_lstm_dropout(2, 6)
    conv = build_conv(64, 2, 4, net_input)

    lstm_conv = build_lstm_conv(lstm, conv)

    first_layer = build_merged_lstm_conv_layer(lstm_conv)

    lstm = build_lstm_dropout(10, 20)
    conv = build_conv(128, 2, 4, net_input)

    lstm_conv = build_lstm_conv(lstm, conv)

    second_layer = build_merged_lstm_conv_layer(lstm_conv)

    lstm = build_lstm_dropout(50, 1)
    fully_connected = Dense(1, activation='sigmoid')

    lstm_fc = Sequential()
    lstm_fc.add(lstm)
    lstm_fc.add(fully_connected)

    third_layer = Merge([lstm_fc, lstm_fc, lstm_fc, lstm_fc], mode='concat')

    final_model = Sequential()
    final_model.add(first_layer)
    final_model.add(Activation('tanh'))
    final_model.add(second_layer)
    final_model.add(Activation('tanh'))
    final_model.add(third_layer)

    final_model.compile(loss=loss, optimizer=optimizer, sample_weight_mode='temporal')

    return final_model

And here are my questions:

  1. If my implementation of the architecture is correct, how do you implement the scanning directions for the four LSTM layers?
  2. If my implementation is not correct, is it possible to implement such an architecture in Keras? If not, are there any other frameworks that can help me in implementing such an architecture?
1

There are 1 answers

2
Van On

You can check this for the implementation of bidirectional LSTM. Basically, you just set go_backwards=True for the backward-LSTM.

However, in your case, you have to write a "mirror"+reshape layer to reverse the rows. A mirror layer can look like (I am using lambda layer here for convenience) : Lambda(lambda x: x[:,::-1,:])