Why do sentence transformers produce slightly different embeddings for the same text?

480 views Asked by At

I noticed that a sentence, say, "This is a first sentence", produces a slightly different embedding depending on the context of other sentences that are encoded along with it:

from sentence_transformers import SentenceTransformer
model = SentenceTransformer("distiluse-base-multilingual-cased-v1")
embeddings1 = model.encode(["This is a first sentence"])
embeddings2 = model.encode(["This is a first sentence", "This is another sentence"])
embeddings1[0,:5]
embeddings2[0,:5]
embeddings2[1,:5]

This produces the following output:

array([0.026788  , 0.02391568, 0.00314784, 0.10020158, 0.02555996], dtype=float32)
array([0.02678801, 0.02391565, 0.00314785, 0.10020156, 0.02555998], dtype=float32)
array([ 0.01069314, -0.02397677,  0.0074933 ,  0.03367725,  0.04727736], dtype=float32)

Notice how the first two vectors in that output are not quite the same. This happens for a couple of other sentence transformer models I've tested as well.

Is this due to rounding error and if so, how does that happen? Or is it expected behavior explained in some way, e.g., by the attention mechanism?

0

There are 0 answers