In deep learning, in particularly NLP, words are transformed into a vector representation to be fed into a neural network such as an RNN. By referring to the link:

In the section of Word Embeddings, it is said that:

A word embedding W:words→Rn is a paramaterized function mapping words in some language to high-dimensional vectors (perhaps 200 to 500 dimensions)

I do not understand the purpose of the dimension of the vectors. What does it mean to have a vector of 200 dimensions compared to a vector of 20 dimensions?

Does it improve the overall accuracy of the model? Could anyone give me a simple example regarding the choice of dimension of the vectors.

1 Answers

mujjiga On Best Solutions

These word embeddings also called Distributed Word Embedding is based on

you know a word by the company it keeps

as quoted by John Rupert Firth

So we know the meaning of a word by its context. You can think of each scalar in the vector (of a word) represents its strength for a concept. This slide from Prof. Pawan Goyal explains it all.

enter image description here

So you want good vector size to capture decent amount of concepts but you do not want a too huge vector because it will then become the bottleneck in training of models where these embeddings are used.

Also the vector size is mostly fixed as most do not train their own embedding but rather use openly available embeddings as they are trained for many hours on huge data. So using them will force us to use an embedding layers with dimensions as given by the openly available embedding you are using (word2vec, glove etc)

Distributed Word Embeddings is a major milestone in the area of deep learning in NLP. They give better accuracy as compared of tfidf based embeddings.