Why do sentence transformers produce slightly different embeddings for the same text?

485 views Asked by Terry.d At 24 October 2023 at 14:51

I noticed that a sentence, say, "This is a first sentence", produces a slightly different embedding depending on the context of other sentences that are encoded along with it:

from sentence_transformers import SentenceTransformer
model = SentenceTransformer("distiluse-base-multilingual-cased-v1")
embeddings1 = model.encode(["This is a first sentence"])
embeddings2 = model.encode(["This is a first sentence", "This is another sentence"])
embeddings1[0,:5]
embeddings2[0,:5]
embeddings2[1,:5]

This produces the following output:

array([0.026788  , 0.02391568, 0.00314784, 0.10020158, 0.02555996], dtype=float32)
array([0.02678801, 0.02391565, 0.00314785, 0.10020156, 0.02555998], dtype=float32)
array([ 0.01069314, -0.02397677,  0.0074933 ,  0.03367725,  0.04727736], dtype=float32)

Notice how the first two vectors in that output are not quite the same. This happens for a couple of other sentence transformer models I've tested as well.

Is this due to rounding error and if so, how does that happen? Or is it expected behavior explained in some way, e.g., by the attention mechanism?

Original Q&A

TechQA.

Why do sentence transformers produce slightly different embeddings for the same text?

There are 0 answers

Related Questions in PYTHON

Related Questions in HUGGINGFACE-TRANSFORMERS

Related Questions in SENTENCE-TRANSFORMERS

Popular Questions

Popular Tags

Trending Questions