Replace bidirectional LSTM with GRU in coref?

107 views Asked by At

I am training the coarse-to-fine coreference model (for some other language than English) from Allennlp with template configs from bert_lstm.jsonnet. When I replace the type “lstm” of the context layer with “gru”, it works, but seems to have very little impact on training. The same 63 GB of RAM are consumed each epoch, validation f1-score is hovering around the same value. Is this change in config actually replace Bi-LSTM layer with Bi-GRU layer, or am I missing something?

    "context_layer": {
    "type": "gru",
    "bidirectional": true,
    "hidden_size": gru_dim,
    "input_size": bert_dim,
    "num_layers": 1
},
1

There are 1 answers

1
Dirk Groeneveld On

It would take some experimentation to be sure, but I assume what's going on is that everything happens inside of BERT (your embedder), and the context_layer does very little regardless of whether it's GRU or LSTM. If you take a look at the similar SpanBERT config, the context layer there is actually just pass-through.

Similar for memory: Most of the memory is consumed by BERT. The context layer contributes little to the memory consumption.