GloVe embeddings - unknown / out-of-vocabulary token

422 views Asked by At

I would like to know if there is a general (default) out-of-vocabulary (OOV) token for GloVe embeddings. In particular for the pre-trained ones from Stanford: https://nlp.stanford.edu/projects/glove/

I found this on SO: What is "unk" in glove.6B.50d.txt?

The given answer suggests that the token "unk" represents the OOV-Token and shared a link to the Glove project on Github as evidence.
However this doesn't seem very conclusive to me as the link only refers to "<unk>" tokens in the code (not "unk"), but "<unk>" does existent in the vocabulary!

So I would like to know, if there is any (default) OOV-token for GloVe (what can be used for unknown/unseen words) and if so what is it?

0

There are 0 answers