I am training Mozilla DeepSpeech on the Common Voice data set on Ubuntu 16.04 LTS x64 with 4 Nvidia GeForce GTX 1080 by executing the command:
./DeepSpeech.py --train_files data/common-voice-v1/cv-valid-train.csv \
--dev_files data/common-voice-v1/cv-valid-dev.csv \
--test_files data/common-voice-v1/cv-valid-test.csv \
--log_level 0 --train_batch_size 20 --train True \
--decoder_library_path ./libctc_decoder_with_kenlm.so \
--checkpoint_dir cv001 --export_dir cv001export \
--summary_dir cv001summaries --summary_secs 600 \
--wer_log_pattern "GLOBAL LOG: logwer('${COMPUTE_ID}', '%s', '%s', %f)" \
--validation_step 2
It uses over 80% of the 4 GPUs.
However, if I add the --display_step 2
argument, it significantly slows down the training time, and it uses less than 20% of the 4 GPUs.
It surprises me as the --display_step
is described as:
tf.app.flags.DEFINE_integer ('validation_step', 0, 'number of epochs we cycle through before validating the model - a detailed progress report is dependent on "--display_step" - 0 means no validation steps')
so from my understanding the model should be evaluated once every 2 epochs, and therefore shouldn't slow down the training time (i.e., it should just add some evaluating time once every 2 epochs).
Why does adding the --display_step 2
argument significantly slow down Mozilla DeepSpeech training time?
Probably because (a) you are not giving a test_batch argument and you therefore use size 1 or (b) this is a really old option and now deprecated and therefore not optimized in any way.
Please state what version you are using, current version is 0.9.2. Maybe update?