I am currently using the Word2Vec model trained on Google News Corpus (from here) Since this is trained on news only until 2013, I need to updated the vectors and also add new words in the vocabulary based on the news coming after 2013.
Suppose I have a new corpus of news after 2013. Can I re-train or fine tune or update the Google News Word2Vec model? Can it be done using Gensim? Can it be done using FastText?
You can have a look at this: https://github.com/facebookresearch/fastText/pull/423
It does exactly the same thing you want: Here is what the link says:
Training the classification model or word vector model incrementally.
-incr stands for incremental training.
When training word embedding, one could do it from scratch with all data at each time, or just on the new data. For classification, one could train it from scratch with pre-trained word embedding with all data, or only the new one, with no changing of the word embedding.
Incremental training actually means, having finished training model with data we got before, and retrain the model with newer data we get, not from scratch.