What memory does Transformer Decoder Only use?

Question

What memory does Transformer Decoder Only use?

2.9k views Asked by bellerb At 17 December 2020 at 13:08

I've been reading a lot about transformers and self attention and have seen both BERT and GPT-2 are a newer version that only use an encoder transformer (BERT) and decoder transformer (GPT-2). I've been trying to build a decoder only model for myself for next sequence prediction but am confused by one thing. I'm using PyTorch and have looked at thereSeq2Seq tutorial and then looked into the Transformer Decoder Block which is made up of Transformer Decoder Layers. My confusion comes from the memory these need to be passed as well. In the documentation they say memory is the last layer of the encoder block which makes sense for a Seq2Seq model but I'm wanting to make a decoder only model. So my question is what do you pass a decoder only model like GPT-2 for memory if you do not have an encoder?

Original Q&A

There are 1 answers

**bellerb** · Answer 1 · 2021-01-15T12:10:32+00:00

After further investigation I believe I can now answer this myself. A decoder only transformer doesn't actually use any memory as there is no encoder-decoder self attention in it like there is in a encoder-decoder transformer. A decoder only transformer looks a lot like an encoder transformer only instead it uses a masked self attention layer over a self attention layer. In order to do this you can pass a square subsequent mask (upper triangle) so that the model cannot look forward to achieve a decoder only model like found in GPT-2/GPT-3.

TechQA.

What memory does Transformer Decoder Only use?

There are 1 answers

Related Questions in PYTHON

Related Questions in PYTORCH

Related Questions in DECODER

Related Questions in TRANSFORMER-MODEL

Related Questions in GPT-2

Popular Questions

Trending Questions