Why does adding the `--display_step 2` argument significantly slow down Mozilla DeepSpeech training time?

117 views Asked by At

I am training Mozilla DeepSpeech on the Common Voice data set on Ubuntu 16.04 LTS x64 with 4 Nvidia GeForce GTX 1080 by executing the command:

./DeepSpeech.py --train_files data/common-voice-v1/cv-valid-train.csv \
--dev_files data/common-voice-v1/cv-valid-dev.csv  \
--test_files data/common-voice-v1/cv-valid-test.csv  \
--log_level 0 --train_batch_size 20 --train True  \
--decoder_library_path ./libctc_decoder_with_kenlm.so  \
--checkpoint_dir cv001 --export_dir cv001export  \
--summary_dir cv001summaries --summary_secs 600  \
--wer_log_pattern "GLOBAL LOG: logwer('${COMPUTE_ID}', '%s', '%s', %f)"  \
--validation_step 2 

It uses over 80% of the 4 GPUs.

However, if I add the --display_step 2 argument, it significantly slows down the training time, and it uses less than 20% of the 4 GPUs.

It surprises me as the --display_step is described as:

tf.app.flags.DEFINE_integer ('validation_step', 0, 'number of epochs we cycle through before validating the model - a detailed progress report is dependent on "--display_step" - 0 means no validation steps')

so from my understanding the model should be evaluated once every 2 epochs, and therefore shouldn't slow down the training time (i.e., it should just add some evaluating time once every 2 epochs).

Why does adding the --display_step 2 argument significantly slow down Mozilla DeepSpeech training time?

1

There are 1 answers

0
Olaf On

Probably because (a) you are not giving a test_batch argument and you therefore use size 1 or (b) this is a really old option and now deprecated and therefore not optimized in any way.

Please state what version you are using, current version is 0.9.2. Maybe update?