I am interested in Memory Networks and Movie Dialog QA. Recently facebook announced AI training framework called ParlAI, which supports many models and datasets. Although I tried below command on ParlAI, the training stopped at first loss.backward()
at memnn.py
. I waited almost one day, but loss.backward()
didn't finish. I have checked this by printing debug and [Using Cuda]
printing. Actually my GPU was working because it used some memory. I checked this by nvidia-smi -l 1
.
python examples/train_model.py -m memnn -t "#moviedd-qa" -bs 32 --gpu 0 -e 10
Then, I switched to simple task, and it finished a few minutes.
python examples/train_model.py -m memnn -t "babi:task1k:1" -bs 32 --gpu 0 -e 10
I realize #moviedd-qa
is more complicated compared to babi task. But how long does it usually take to train this model in my setting? Does anyone try to train this model via ParlAI? I am afraid this is not bug of ParlAI. Could you advise me to proceed my work?
My Environment
- Ubunt 16.04.03 LTS, 64 bit
- python 3.6.1 (Anaconda 4.4.0 (64-bit))
- GPU: GTX 1080 ti
- CPU: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
- torch.version: '0.2.0_3'
I am also asking developers at ParlAI at their github, but no responses.