I am using the BERT Squad model to ask the same question on a collection of documents (>20,000). The model currently runs on my CPU and it takes around a minute to process a single document - which means that I'll need several days to complete the program.
I was wondering if I could speed this up by running the model on a GPU. However, I am new to GPUs and I don't know how to send these inputs and the model to the device (Titan xp).
The code is borrowed from Chris McChormick.
import torch
import tensorflow as tf
from transformers import BertForQuestionAnswering
from transformers import BertTokenizer
model = BertForQuestionAnswering.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')
tokenizer = BertTokenizer.from_pretrained('bert-large-uncased-whole-word-masking-finetuned-squad')
'question' and 'answer_text' are the question and the context string respectively.
input_ids = tokenizer.encode(question, answer_text)
# ======== Set Segment IDs ========
# Search the input_ids for the first instance of the `[SEP]` token.
sep_index = input_ids.index(tokenizer.sep_token_id)
if len(input_ids)>512:
input_ids=input_ids[:512]
num_seg_a = sep_index + 1
num_seg_b = len(input_ids) - num_seg_a
# Construct the list of 0s and 1s.
segment_ids = [0]*num_seg_a + [1]*num_seg_b
# There should be a segment_id for every input token.
assert len(segment_ids) == len(input_ids)
# ======== Evaluate ========
# Run our example through the model.
outputs = model(torch.tensor([input_ids]), # The tokens representing our input text.
token_type_ids=torch.tensor([segment_ids]), # The segment IDs to differentiate question from answer_text
return_dict=True)
start_scores = outputs.start_logits
end_scores = outputs.end_logits
I know that I can send the model to the GPU using model.tocuda(). But how do I send the inputs, train the model, and the retreive output from the GPU?
It's been a while, but I'll answer anyway in the hope that maybe it will help someone. You can copy each tensor to the GPU using the
to
method. For example your batch contains 4 pytorch tensors: input ids, attention masks, segment ids and labelsThen,You can use the
.cpu()
to transfer the logits and labels from the gpu back to the cpu. In example;or similarly to(device) you can use
Note that: Since you will be using them in the model, you will probably need to add .numpy() to the end and convert them to a numpy array.
Source:https://discuss.pytorch.org/t/time-to-transform-gpu-to-cpu-with-cpu/18856