How should returned attention masks be accessed from the FeatureExtractionPipeline in Huggingface?
The code below takes an embedding model, distributes it and a huggingface dataset across 8 GPUs on a single node, and performs inference on the inputs. The code requires the attention masks for mean pooling.
Code example:
from accelerate import Accelerator
from accelerate.utils import tqdm
from transformers import AutoTokenizer, AutoModel
from optimum.bettertransformer import BetterTransformer
import torch
from datasets import load_dataset
from transformers import pipeline
accelerator = Accelerator()
model_name = "BAAI/bge-large-en-v1.5"
tokenizer = AutoTokenizer.from_pretrained(model_name,)
model = AutoModel.from_pretrained(model_name,)
pipe = pipeline(
"feature-extraction",
model=model,
tokenizer=tokenizer,
max_length=512,
truncation=True,
padding=True,
pad_to_max_length=True,
batch_size=256,
framework="pt",
return_tensors=True,
return_attention_mask=True,
device=(accelerator.device)
)
dataset = load_dataset(
"wikitext",
"wikitext-2-v1",
split="train",
)
#Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output[0] #First element of model_output contains all token embeddings
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
# Assume 8 processes
with accelerator.split_between_processes(dataset["text"]) as data:
for out in pipe(data):
sentence_embeddings = mean_pooling(out, out["attention_mask"])
I need the attention maks from pipe to use for mean pooling.
Best,
Enrico
The
pipeline
object from thetransformers
library provides a convenient abstraction for quick inference of models, but for more customized solutions it's usually a good idea to use the models directly. For example:You can then use the
embeddings
andattention_mask
to compute the mean pooling. You may also consider usingout.pooler_output
instead of computing the mean pooling manually, however, I am not sure how thepooler_output
is computed in this case, so be wary.