Check the commonality of the two sentences

234 views Asked by At

To check the commonality of the two sentences, I used the model text-embedding-ada-002 of azure openai . However, it is not very accurate with negative sentences and antonyms. Example 2 sentences: I hate eating candy and I like eating candy, the similarity is 0.927 . Does this mean I'm using the wrong model or is there something I need to adjust?

Below is the python code to find the common point of 2 sentences

resp = openai.Embedding.create(
    input=[dict["text1"], dict["text2"]],
    engine="solize-dokushokai-openai-embeddings")

    embedding_a = resp['data'][0]['embedding']
    embedding_b = resp['data'][1]['embedding']

    similarity_score = np.dot(embedding_a, embedding_b)

enter image description here

1

There are 1 answers

0
William Westerkamp On

I think this may be an issue with the model's limitations. Specially with the capture of negations and antonyms. It may be a good idea to use a different model that is known to perform well on semantic similarity tasks.

Here are some options:

  1. Universal Sentence Encoder (USE): Developed by Google

  2. BERT (Bidirectional Encoder Representations from Transformers)

  3. RoBERTa: RoBERTa is another variant of BERT that further refines the training process

  4. Sentence-BERT (SBERT): Sentence-BERT is an extension of BERT, specifically designed for computing sentence embeddings.

Good luck!