Find most similar words for OOV word

Question

Find most similar words for OOV word

259 views Asked by N0rA At 22 May 2020 at 10:29

I am looking for the most similar words for out-of-vocab OOV words using gensim. Something like this:

    def get_word_vec(self, model, word):
    try:
        if word not in model.wv.vocab:
            mostSimWord = model.wv.similar_by_word(word)
            print(mostSimWord)
        else:
            print( word )
    except Exception as ex:
        print(ex)

Is there are way to achieve this task? Options other than gensim also welcomed.

Original Q&A

There are 1 answers

**gojomo** · Answer 1 · 2020-05-22T18:21:24+00:00

If you train a FastText model instead of a Word2Vec model, it inherently learns vectors for word-fragments (of configurable size ranges) in addition to full words.

In languages like English & many others (but not all), unknown words are often typos, alternate forms, or related in terms of roots and suffixes to knwon words. Thus, having vectors for subwords, then using those to tally up a good guess vector for an unknown word, can work well enough to be worth trying – better than ignoring such words, or using a totally random or origin-point vector.

There's no built-in way to try to extract such relationships from an existing set of word-vectors that isn't FastText/subword-based – but it'd be theoretically possible. You could compute edit distances to, or counts-of-shared-subwords with, all known words, & create a guess-vector by weighted combination of the N-closest words. (This might work really well with typos & rarer alternate spellings, but not as much for truly-absent novel words.)

TechQA.

Find most similar words for OOV word

There are 1 answers

Related Questions in PYTHON

Related Questions in NLP

Related Questions in GENSIM

Related Questions in SIMILARITY

Related Questions in OOV

Popular Questions

Trending Questions