Replace bidirectional LSTM with GRU in coref?

Question

Replace bidirectional LSTM with GRU in coref?

162 views Asked by Глеб Гутник At 06 May 2022 at 17:15

I am training the coarse-to-fine coreference model (for some other language than English) from Allennlp with template configs from bert_lstm.jsonnet. When I replace the type “lstm” of the context layer with “gru”, it works, but seems to have very little impact on training. The same 63 GB of RAM are consumed each epoch, validation f1-score is hovering around the same value. Is this change in config actually replace Bi-LSTM layer with Bi-GRU layer, or am I missing something?

    "context_layer": {
    "type": "gru",
    "bidirectional": true,
    "hidden_size": gru_dim,
    "input_size": bert_dim,
    "num_layers": 1
},

Original Q&A

There are 1 answers

**Dirk Groeneveld** · Answer 1 · 2022-05-26T20:41:16+00:00

It would take some experimentation to be sure, but I assume what's going on is that everything happens inside of BERT (your embedder), and the context_layer does very little regardless of whether it's GRU or LSTM. If you take a look at the similar SpanBERT config, the context layer there is actually just pass-through.

Similar for memory: Most of the memory is consumed by BERT. The context layer contributes little to the memory consumption.

TechQA.

Replace bidirectional LSTM with GRU in coref?

There are 1 answers

Related Questions in PYTORCH

Related Questions in LSTM

Related Questions in ALLENNLP

Related Questions in COREFERENCE-RESOLUTION

Related Questions in GRU

Popular Questions

Popular Tags

Trending Questions