Is there any difference between tensor2tensor and pytorch in view of memory?

Question

Is there any difference between tensor2tensor and pytorch in view of memory?

316 views Asked by kintsuba At 05 May 2020 at 07:09

I'm trying to train seq2seq model(transformer) with pytorch and tensor2tensor. When using tensor2tensor, the batch size can be like 1024, while pytorch model shows CUDA out of memory error with 8 batch size.

Is there any technique used in tensor2tensor to make best use of memory.

If anyone know this, please tell me.

Thanks in advance.

Original Q&A

There are 1 answers

**Martin Popel** · Answer 1 · 2020-05-05T08:22:43+00:00

In Tensor2Tensor by default, the batch size is specified in the number of tokens (subwords) per single GPU. This allows to use a higher number of short sequences (sentences) in one batch or a smaller number of long sequences. Most other toolkits use a fixed batch size specified in the number of sequences. Either way, it may be a good idea to limit the maximum sentence length in training to a reasonable number to prevent Out-of-memory errors and excessive padding. Some toolkits also prefer to specify the total batch size per all GPU cards.

TechQA.

Is there any difference between tensor2tensor and pytorch in view of memory?

There are 1 answers

Related Questions in TENSORFLOW

Related Questions in PYTORCH

Related Questions in TENSOR2TENSOR

Popular Questions

Trending Questions