Running Flair embeddings parallel

549 views Asked by At

I have a list containing millions of sentences for which I need embeddings. I am using Flair for this purpose. The problem seems like it should be embarrassingly parallel. But when I try to optimize, I get either no increase in performance, or it simply stalls.

I define my sentences as a simple list of strings:

texts = [
    "this is a test",
    "to see how well",
    "this system works",
    "here are alot of words",
    "many of them",
    "they keep comming",
    "many more sentences",
    "so many",
    "some might even say",
    "there are 10 of them",
]

I use Flair to create the embeddings:

from flair.embeddings import SentenceTransformerDocumentEmbeddings
from flair.data import Sentence

sentence_embedding = SentenceTransformerDocumentEmbeddings("bert-base-nli-mean-tokens")

def sentence_to_vector(sentence):
    sentence_tokens = Sentence(sentence)
    sentence_embedding.embed(sentence_tokens)
    return sentence_tokens.get_embedding().tolist()

I tried with both Joblib Concurrent Futures to solve the problem in parallel:

import time
from joblib import Parallel, delayed
import concurrent.futures

def parallelize(iterable, func):
    return Parallel(n_jobs=4, prefer="threads")(delayed(func)(i) for i in iterable)

print("start embedding sequentially")
tic = time.perf_counter()
embeddings = [sentence_to_vector(text) for text in texts]
toc = time.perf_counter()
print(toc - tic)

print("start embedding parallel, w. joblib")
tic = time.perf_counter()
embeddings = parallelize(texts, sentence_to_vector)
toc = time.perf_counter()
print(toc - tic)

print("start embedding parallel w. concurrent.futures")
tic = time.perf_counter()
with concurrent.futures.ProcessPoolExecutor() as executor:
    embeddings = [executor.submit(sentence_to_vector, text) for text in texts]
toc = time.perf_counter()
print(toc - tic)

The Joblib function is running, but it is slower than doing it sequential. The concurrent.futures function spins up a bunch of threads but hangs indefinitely.

Any solutions or hints in the right direction would be much appreciated.

1

There are 1 answers

0
Deepak Garud On

Using the analogy of a trained model - It appears that the trained model is only capable of recognizing one item at a time.

By making copies of the file, and running all - there should be no problem for parallel processing e.g. Prog1.py, prog2.py ... are copies of the same code - when run they get different data to process. To manually run in parallel, open multiple command windows and run a different file in each.

To run programmatically, a master program can create sub processes and send different data to each. Or a batch file can launch the programs. e.g. run 10 copies of your script and sent 1/10 of your sentences to each.

Then combine the results.

Keep an eye on CPU and memory used to avoid machine getting 100 pc CPU usage. (Slowly increase number of programs and data as you figure out how many parallel programs can be handled by the computer)