How to use pretrained GloVe vectors in a tensorflow LSTM generative model

1.6k views Asked by At

My goal is to ask if it is possible to use pre-trained GloVe vectors in a Tensorflow word-rnn LSTM generative model, and if so any guidance on how to achieve this?

I am referencing this from here and I understand(I think) that I am supposed to put the vectors in the embeddings in line 35-37 of the model.py. From the code, I see that he is not using any pre-trained vectors but the words from the input text.

I have seen other answers like this but as I am new to Tensorflow and Python I do not fully understand how to apply this into the code.

GloVe generates two files, namely:

  1. vocabulary file, with the count of all word occurrences
  2. vector file. e.g the word [also -0.5432 -0.3210 0.1234...n_dimensions..]

Also, do I have to generate the GloVe vectors and train the LSTM model on the same corpus or can they be separate? eg. GloVe(100k words), text_to_train(50k words)

Thank you for the assistance!

1

There are 1 answers

0
kiriloff On

Embeddings are word encoding, you load a pre-trained Glove encoding "dictionary" with 400 000 entries, where each token or entry is encoded in a 1D-vector of dim 50 for Glove 50, 100 for Glove 100 etc.

Your input dataset of dim N, M will go through encoding, each entry in the input dataset is encoded in the Glove encoding and stored in a row of the embedding matrix, of dim N, 50 or N, 100 etc.

You build a Keras embedding layer from this embedding matrix, which output is fed into the LSTM.

https://keras.io/examples/nlp/pretrained_word_embeddings/