I would like to know if there is a general (default) out-of-vocabulary (OOV) token for GloVe embeddings. In particular for the pre-trained ones from Stanford: https://nlp.stanford.edu/projects/glove/
I found this on SO: What is "unk" in glove.6B.50d.txt?
The given answer suggests that the token "unk"
represents the OOV-Token and shared a link to the Glove project on Github as evidence.
However this doesn't seem very conclusive to me as the link only refers to "<unk>"
tokens in the code (not "unk"
), but "<unk>"
does existent in the vocabulary!
So I would like to know, if there is any (default) OOV-token for GloVe (what can be used for unknown/unseen words) and if so what is it?