Is there a way to get the relationship from 'GloVe' word2vec?

102 views Asked by At

I am using Glove, Gensim-word2vec, module and I can use it to return the similarity score between entities such as 'man' and 'woman' will return 0.89034. But is there a way to return the semantic relationship between two entities? For example given the word 'people' and a 'location', the result should be the relationship 'lives_in'?

I can do something like:

print(model.most_similar(positive=['king', 'woman'], negative=['man']))

Output is:

[('queen', 0.775162398815155), ('prince', 0.6123066544532776), ('princess', 0.6016970872879028), ('kings', 0.5996100902557373), ('queens', 0.565579891204834), ('royal', 0.5646308660507202), ('throne', 0.5580971240997314), ('Queen', 0.5569202899932861), ('monarch', 0.5499411821365356), ('empress', 0.5295248627662659)]

Desired output:

[(is_a, 0.3223), (same_as, 0349230), (people, 0302432) ...]
2

There are 2 answers

0
gojomo On

Not really, as word-vectors don't really know such relationships by name.

Rather, it's just a useful, happy result of the training-process that words arrange themselves in ways that reflect both pairwise similarity and, in certain relative-directions, a vague concordance with our mental models of types-of-relationships.

As useful as these directions are, even relationships as sharp as "part_of" (meronymy) or "more_specific_example_of" (hyponymy) may not have strong, consistent directions in the vector-space.

And for your example of 'man' X 'woman', and suggestion that X='similar_to' would be a suitable answer, that seems muddled to me. Usually 'man' to 'woman' are placed in contrast to emphasize some gender-related difference/direction. that they are similar is little more interesting to say that a word is similar_to its 10 nearest neighbors, or 100 nearest neighbors, or 10,000 nearest neighbors (compared to all the other words in the model). You can easily read many similar_to relationships out of the model, but pairs which saliently isolate aspects of human perception can be harder to label/identify. (For example, 'hot' and 'cold' are fairly similar, as they're used in similar contexts, but also semantically antonyms, in that they are specifically used to highlight exclusive and opposite temperature-levels compared to some frame-of-reference.)

There is more advanced work that explicitly tries to create word-vector sets that are more capable of answering questions, especially property-related questions - but standard word-vectors won't do especially well on such things.

5
Venkatachalam On

First, you can decide an ideal example for people and location.

ex- people Trumph and location whitehouse.

Then, for a new people George and location California,

You can make the following math to calculate the live _in score,

Cosine similarly between (A-B+C) and D

Implementation:

from scipy.spatial.distance import cosine
cosine(model.wv['Trump']-model.wv['whitehouse']+model.wv['George'],model.wv['California'])