I am confronted some problem when I use the pretrained_model: w2v_512.model.
the error is "Key 'xxx' is not present"
I think this may the word of 'xxx' can not be convert to embedding from the w2v_512.model owing to the model did not see this word in the pre-training process.
I want to kown how to solve it. Will it help if I use the BERT embedding. If so, how to use the BERT to get the embedding.
I would be appreciated if anybody answer me!!!
A set of word2vec vectors can only provide vectors for words that were included at training-time.
You could:
I believe that BERT models also understand words built-up from subword tokens, a bit like FastText, so could offer an embedding for arbitrary words. So, you could try that and see if it works for you. But, the quality of any such embedding will remain dependent on how well the model was trained around that word & similar words. So, you should always be checking how well the results are working, for your goals – the mere fact a model can return an embedding you can use isn't enough to be sure that embedding is worth using.