How can I encode 10 strings into embeddings in parallel?

163 views Asked by At

I experiment with bert model and SentenceTransformer library. I know how to turn strings into embeddings. But how to encode 10 strings into embeddings in parallel.

Here is example of how I turn one string into embeddings:

!pip install -U sentence-transformers

from sentence_transformers import SentenceTransformer

sentences = ["This is an example sentence"]

model = SentenceTransformer('sentence-transformers/LaBSE')
embeddings = model.encode(sentences)

How can I turn these 10 sentences into embeddings in parallel?

[
    "This is an example sentence",
    "Each sentence is converted",
    "Learning new skills is an ongoing journey.",
    "The sky painted a mesmerizing array of colors at sunset.",
    "Technology continues to revolutionize the way we live.",
    "Understanding different perspectives fosters empathy.",
    "Nature's beauty often lies in its simplicity.",
    "Traveling opens doors to diverse cultures and traditions.",
    "Music has the power to evoke strong emotions.",
    "Kindness is a language that transcends barriers."
]
1

There are 1 answers

0
petezurich On

The code is the same. Just provide the list with 10 (or any number of) sentences to the model.encode() function.

sentences = [
    "This is an example sentence",
    "Each sentence is converted",
    "Learning new skills is an ongoing journey.",
    "The sky painted a mesmerizing array of colors at sunset.",
    "Technology continues to revolutionize the way we live.",
    "Understanding different perspectives fosters empathy.",
    "Nature's beauty often lies in its simplicity.",
    "Traveling opens doors to diverse cultures and traditions.",
    "Music has the power to evoke strong emotions.",
    "Kindness is a language that transcends barriers."
]
embeddings = model.encode(sentences)
print(embeddings.shape)
>>> (10, 768)

If you look at the documentation you see that you always provide a list of texts (sentences) to the function.

Parameters sentences – the sentences to embed

The encode() function processes your texts in batches. The default is 32 texts that are processed in parallel. You can set the batch size with the parameter batch_size.