I am using Glove, Gensim-word2vec, module and I can use it to return the similarity score between entities such as 'man'
and 'woman'
will return 0.89034
. But is there a way to return the semantic relationship between two entities? For example given the word 'people'
and a 'location'
, the result should be the relationship 'lives_in'
?
I can do something like:
print(model.most_similar(positive=['king', 'woman'], negative=['man']))
Output is:
[('queen', 0.775162398815155), ('prince', 0.6123066544532776), ('princess', 0.6016970872879028), ('kings', 0.5996100902557373), ('queens', 0.565579891204834), ('royal', 0.5646308660507202), ('throne', 0.5580971240997314), ('Queen', 0.5569202899932861), ('monarch', 0.5499411821365356), ('empress', 0.5295248627662659)]
Desired output:
[(is_a, 0.3223), (same_as, 0349230), (people, 0302432) ...]
Not really, as word-vectors don't really know such relationships by name.
Rather, it's just a useful, happy result of the training-process that words arrange themselves in ways that reflect both pairwise similarity and, in certain relative-directions, a vague concordance with our mental models of types-of-relationships.
As useful as these directions are, even relationships as sharp as "part_of" (meronymy) or "more_specific_example_of" (hyponymy) may not have strong, consistent directions in the vector-space.
And for your example of 'man' X 'woman', and suggestion that X='similar_to' would be a suitable answer, that seems muddled to me. Usually 'man' to 'woman' are placed in contrast to emphasize some gender-related difference/direction. that they are
similar
is little more interesting to say that a word issimilar_to
its 10 nearest neighbors, or 100 nearest neighbors, or 10,000 nearest neighbors (compared to all the other words in the model). You can easily read manysimilar_to
relationships out of the model, but pairs which saliently isolate aspects of human perception can be harder to label/identify. (For example, 'hot' and 'cold' are fairly similar, as they're used in similar contexts, but also semantically antonyms, in that they are specifically used to highlight exclusive and opposite temperature-levels compared to some frame-of-reference.)There is more advanced work that explicitly tries to create word-vector sets that are more capable of answering questions, especially property-related questions - but standard word-vectors won't do especially well on such things.