how does nn.embedding for developing an encoder-decoder model works?

531 views Asked by At

In this tutorial, it teaches how to develop a simple encoder-decoder model with attention using pytorch. However, in the encoder or decoder, self.embedding = nn.Embedding(input_size, hidden_size) (or similar) is defined. In pytorch documents, nn.Embedding is defined as "A simple lookup table that stores embeddings of a fixed dictionary and size." So I am confused that, in the initialization, where does this lookup table has come from? Does it initialize some random embeddings for the indices and then they will be trained? Is it really necessary to be in the encoder/decoder part? Thanks in advance.

1

There are 1 answers

1
dedObed On BEST ANSWER

Answering the last bit first: Yes, we do need Embedding or an equivalent. At least when dealing with discrete inputs (e.g. letters or words of a language), because these tokens come encoded as integers (e.g. 'a' -> 1, 'b' -> 2, etc.), but those numbers do not carry meaning: The letter 'b' is not "like 'a', but more", which its original encoding would suggest. So we provide the Embedding so that the network can learn how to represent these letters by something useful, e.g. making vowels similar to one another in some way.

During the initialization, the embedding vector are sampled randomly, in the same fashion as other weights in the model, and also get optimized with the rest of the model. It is also possible to initialize them from some pretrained embeddings (e.g. from word2vec, Glove, FastText), but caution must then be exercised not to destroy them by backprop through randomly initialized model.


Embeddings are not stricly necessary, but it would be very wasteful to force network to learn that 13314 ('items') is very similar to 89137 ('values'), but completely different to 13315 ('japan'). And it would probably not even remotely converge anyway.