Torch's loss.backward() hanged on ParlAI

105 views Asked by jef At 20 September 2017 at 19:57

I am interested in Memory Networks and Movie Dialog QA. Recently facebook announced AI training framework called ParlAI, which supports many models and datasets. Although I tried below command on ParlAI, the training stopped at first loss.backward() at memnn.py. I waited almost one day, but loss.backward() didn't finish. I have checked this by printing debug and [Using Cuda] printing. Actually my GPU was working because it used some memory. I checked this by nvidia-smi -l 1.

python examples/train_model.py -m memnn -t "#moviedd-qa" -bs 32 --gpu 0 -e 10

Then, I switched to simple task, and it finished a few minutes.

python examples/train_model.py -m memnn -t "babi:task1k:1" -bs 32 --gpu 0 -e 10

I realize #moviedd-qa is more complicated compared to babi task. But how long does it usually take to train this model in my setting? Does anyone try to train this model via ParlAI? I am afraid this is not bug of ParlAI. Could you advise me to proceed my work?

My Environment

Ubunt 16.04.03 LTS, 64 bit
python 3.6.1 (Anaconda 4.4.0 (64-bit))
GPU: GTX 1080 ti
CPU: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
torch.version: '0.2.0_3'

I am also asking developers at ParlAI at their github, but no responses.

Original Q&A

TechQA.

Torch's loss.backward() hanged on ParlAI

There are 0 answers

Related Questions in PYTHON

Related Questions in MACHINE-LEARNING

Related Questions in PYTORCH

Related Questions in BACKPROPAGATION

Related Questions in PARLAI

Popular Questions

Popular Tags

Trending Questions