fine tuning pre-trained word2vec Google News

Question

fine tuning pre-trained word2vec Google News

4.1k views Asked by ayush singhal At 15 September 2017 at 16:48

I am currently using the Word2Vec model trained on Google News Corpus (from here) Since this is trained on news only until 2013, I need to updated the vectors and also add new words in the vocabulary based on the news coming after 2013.

Suppose I have a new corpus of news after 2013. Can I re-train or fine tune or update the Google News Word2Vec model? Can it be done using Gensim? Can it be done using FastText?

Original Q&A

There are 2 answers

**shasvat desai** · Answer 1 · 2018-06-18T12:58:23+00:00

You can have a look at this: https://github.com/facebookresearch/fastText/pull/423

It does exactly the same thing you want: Here is what the link says:

Training the classification model or word vector model incrementally.

./fasttext [supervised | skipgram | cbow] -input train.data -inputModel trained.model.bin -output re-trained [other options] -incr

-incr stands for incremental training.

When training word embedding, one could do it from scratch with all data at each time, or just on the new data. For classification, one could train it from scratch with pre-trained word embedding with all data, or only the new one, with no changing of the word embedding.

Incremental training actually means, having finished training model with data we got before, and retrain the model with newer data we get, not from scratch.

**ashutosh singh** · Answer 2 · 2019-04-18T17:24:54+00:00

ashutosh singh On 18 April 2019 at 17:24

Yes you can. I have been working on this too recently.

word2vec Reference
GloVe Reference

Edit: GloVe has a overhead of computing and storing co-occurence matrix in memory while training. Training word2vec is comparatively easy

TechQA.

fine tuning pre-trained word2vec Google News

There are 2 answers

Related Questions in PYTHON

Related Questions in GENSIM

Related Questions in WORD2VEC

Related Questions in GOOGLE-NEWS

Related Questions in FASTTEXT

Popular Questions

Popular Tags

Trending Questions