To check the commonality of the two sentences, I used the model text-embedding-ada-002 of azure openai . However, it is not very accurate with negative sentences and antonyms. Example 2 sentences: I hate eating candy and I like eating candy, the similarity is 0.927 . Does this mean I'm using the wrong model or is there something I need to adjust?
Below is the python code to find the common point of 2 sentences
resp = openai.Embedding.create(
input=[dict["text1"], dict["text2"]],
engine="solize-dokushokai-openai-embeddings")
embedding_a = resp['data'][0]['embedding']
embedding_b = resp['data'][1]['embedding']
similarity_score = np.dot(embedding_a, embedding_b)
I think this may be an issue with the model's limitations. Specially with the capture of negations and antonyms. It may be a good idea to use a different model that is known to perform well on semantic similarity tasks.
Here are some options:
Universal Sentence Encoder (USE): Developed by Google
BERT (Bidirectional Encoder Representations from Transformers)
RoBERTa: RoBERTa is another variant of BERT that further refines the training process
Sentence-BERT (SBERT): Sentence-BERT is an extension of BERT, specifically designed for computing sentence embeddings.
Good luck!