I'd like to use some of the good large language models to estimate how similar the meanings of two strings are, for example "cat" and "someone who likes to play with yarn", or "cat" and "car".
Maybe some libraries provide a function for comparing strings, or we could implement some method such as measuring the similarity of their embeddings in a deep layer or whatever is appropriate.
I hope that something without much boilerplate code is possible. Something like:
import language_models, math
my_llm = language_models.load('llama2')
print(math.dist(
my_llm.embedding('cat'),
my_llm.embedding('someone who likes to play with yarn')))
Ideally, it should be easy to try different recent LLMs. (In the "example" above, that would mean replacing 'llama2'
by another model name.)
Spacy is the way:
Output:
The similarity value will be a number between
0
and1
, where1
means the sentences are exactly the same, and0
means they have no similarity. This value can give you an idea of how similar the meanings of the two sentences are.