I tried the seq2seq pytorch implementation available here seq2seq . After profiling the evaluation(evaluate.py) code, the piece of code taking longer time was the decode_minibatch method
def decode_minibatch(
config,
model,
input_lines_src,
input_lines_trg,
output_lines_trg_gold
):
"""Decode a minibatch."""
for i in xrange(config['data']['max_trg_length']):
decoder_logit = model(input_lines_src, input_lines_trg)
word_probs = model.decode(decoder_logit)
decoder_argmax = word_probs.data.cpu().numpy().argmax(axis=-1)
next_preds = Variable(
torch.from_numpy(decoder_argmax[:, -1])
).cuda()
input_lines_trg = torch.cat(
(input_lines_trg, next_preds.unsqueeze(1)),
1
)
return input_lines_trg
Trained the model on GPU and loaded the model in CPU mode to make inference. But unfortunately, every sentence seems to take ~10sec. Is slow prediction expected on pytorch?
Any fixes, suggestions to speed up would be much appreciated. Thanks.
Following on dragon7 answer, I'd recommend using the ONNX export from Optimum that can handle out of the box the export for encoder/decoder models, as well as make use of past key values in the decoder:
If you want to use OpenVINO, a good option can be the
OVModel
that handle the inference with OpenVINO (especially for seq2seq models) out of the box!Disclaimer: I am a contributor to Optimum library