Error Training Deepspeech inside Docker with Ubuntu 20.04 integration on Windows 10 (Nvidia Gpu Rtx 3090)

Question

Error Training Deepspeech inside Docker with Ubuntu 20.04 integration on Windows 10 (Nvidia Gpu Rtx 3090)

95 views Asked by Emanuel Zamorano At 11 November 2023 at 23:11

I'm working with Mozilla DeepSpeech in a Docker environment and have encountered an error during training. I'm seeking assistance to resolve this issue.

System Setup:

Docker environment on a Windows 10 PC
Using Ubuntu-20-04 in Docker
NVIDIA GPU RTX 3090 with --gpus all flag enabled
CUDA 10.0 Version 10.0.130 with cuDNN v7.6.5 (November 5th, 2019), for CUDA 10.0
Python 3.7.3

Steps Taken:

Installed the official training image for deepspeech to use in docker (mozilla/deepspeech-train:v0.9.3) followed the exact steps mentioned in this site (https://mozilla.github.io/deepspeech-playbook/ENVIRONMENT.html#contents)
Successfully ran the provided script (./bin/run-ldc93s1.sh) in the Docker environment.
Created a custom training script for my dataset.
Faced challenges with file paths, resolved by mounting the WSL 2 directory to the Docker container.
Updated script paths to match the mounted directory.

My Script: ``` root@b11bd0a278ee:/DeepSpeech#

python -u DeepSpeech.py   
--train_files /DeepSpeech/CSV/Training/training.csv   
--dev_files /DeepSpeech/CSV/Validation/dev.csv   
--test_files /DeepSpeech/CSV/Test/test.csv   
--alphabet_config_path /DeepSpeech/data/alphabet.txt   
--scorer_path /DeepSpeech/deepspeech-0.9.3-models.scorer   
--checkpoint_dir /DeepSpeech/checkpoints_dir   
--export_dir /DeepSpeech/CSV/exports_dir   
--train_batch_size 1   
--test_batch_size 1   
--n_hidden 100   
--epochs 200   
--noshow_progressbar

Issue: When running my custom training script, I encounter the following error:

Traceback (most recent call last):
  File "DeepSpeech.py", line 12, in <module>
    ds_train.run_script()
  File "/DeepSpeech/training/deepspeech_training/train.py", line 982, in run_script
    absl.app.run(main)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/DeepSpeech/training/deepspeech_training/train.py", line 949, in main
    early_training_checks()
  File "/DeepSpeech/training/deepspeech_training/train.py", line 934, in early_training_checks
    FLAGS.scorer_path, Config.alphabet)
  File "/usr/local/lib/python3.6/dist-packages/ds_ctcdecoder/__init__.py", line 36, in __init__
    raise ValueError('Scorer initialization failed with error code 0x{:X}'.format(err))
ValueError: Scorer initialization failed with error code 0x2005

```

Tried looking for the path: root@b11bd0a278ee:/DeepSpeech# ls /DeepSpeech/deepspeech-0.9.3- models.scorer ls: cannot access '/DeepSpeech/deepspeech-0.9.3- models.scorer': No such file or directory Found the path: root@b11bd0a278ee:/DeepSpeech# find / -type f ( -name "alphabet.txt" -o -name ".csv" -o -name ".scorer" ) /DeepSpeechData/DeepSpeech/deepspeech-0.9.3-models.scorer /DeepSpeechData/DeepSpeech/data/alphabet.txt /DeepSpeechData/DeepSpeech/CSV/Test/test.csv /DeepSpeechData/DeepSpeech/CSV/Training/training.csv /DeepSpeechData/DeepSpeech/CSV/Validation/dev.csv /DeepSpeechData/DeepSpeech/CSV/Model Checkpoints/Model Checkpoints.csv

2nd try:

    root@b11bd0a278ee:/DeepSpeech# python -u DeepSpeech.py \
    >   --train_files 
    /DeepSpeechData/DeepSpeech/CSV/Training/training.csv \
    >   --dev_files /DeepSpeechData/DeepSpeech/CSV/Validation/dev.csv \
    >   --test_files /DeepSpeechData/DeepSpeech/CSV/Test/test.csv \
    habet_c>   --alphabet_config_path 
    /DeepSpeechData/DeepSpeech/data/alphabet.txt \
    >   --scorer_path /DeepSpeechData/DeepSpeech/deepspeech-0.9.3- 
    models.scorer \
    >   --checkpoint_dir /DeepSpeechData/DeepSpeech/checkpoints_dir \
    >   --export_dir /DeepSpeechData/DeepSpeech/CSV/exports_dir \
    >   --train_batch_size 1 \
    >   --test_batch_size 1 \
    >   --n_hidden 100 \
    >   --epochs 200 \
    >   --noshow_progressbar
    I Loading best validating checkpoint from 
    /DeepSpeechData/DeepSpeech/checkpoints_dir/best_dev-1466475
    I Loading variable from checkpoint: beta1_power
    I Loading variable from checkpoint: beta2_power
    I Loading variable from checkpoint: 
    cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
    Traceback (most recent call last):
    File "DeepSpeech.py", line 12, in <module>
    ds_train.run_script()
    File "/DeepSpeech/training/deepspeech_training/train.py", line 982, 
    in 
    run_script
    absl.app.run(main)
    File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, 
    in 
    run
    _run_main(main, args)
    File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, 
    in 
    _run_main
    sys.exit(main(argv))
    File "/DeepSpeech/training/deepspeech_training/train.py", line 954, 
    in 
    main
    train()
    File "/DeepSpeech/training/deepspeech_training/train.py", line 529, 
    in 
    train
    load_or_init_graph_for_training(session)
    File "/DeepSpeech/training/deepspeech_training/util/checkpoints.py", 
    line 137, in load_or_init_graph_for_training
    _load_or_init_impl(session, methods, allow_drop_layers=True)
    File "/DeepSpeech/training/deepspeech_training/util/checkpoints.py", 
    line 98, in _load_or_init_impl
    return _load_checkpoint(session, ckpt_path, allow_drop_layers, 
    allow_lr_init=allow_lr_init)
    File "/DeepSpeech/training/deepspeech_training/util/checkpoints.py", 
    line 71, in _load_checkpoint
    v.load(ckpt.get_tensor(v.op.name), session=session)
    File "/usr/local/lib/python3.6/dist- 
    packages/tensorflow_core/python/util/deprecation.py", line 324, in 
    new_func
    return func(*args, **kwargs)
    File "/usr/local/lib/python3.6/dist- 
    packages/tensorflow_core/python/ops/variables.py", line 1033, in load
    session.run(self.initializer, {self.initializer.inputs[1]: value})
    File "/usr/local/lib/python3.6/dist- 
    packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
    File "/usr/local/lib/python3.6/dist- 
    packages/tensorflow_core/python/client/session.py", line 1156, in 
    _run
    (np_val.shape, subfeed_t.name, str(subfeed_t.get_shape())))
    ValueError: Cannot feed value of shape (8192,) for Tensor 
cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Initial 
    izer/Const:0', which has shape '(400,)'

3rd Try:

   root@b11bd0a278ee:/DeepSpeech# python -u DeepSpeech.py   -- 
   train_files /DeepSpeechData/DeepSpeech/CSV/Training/training.csv   -- 
   dev_files /DeepSpeechData/DeepSpeech/CSV/Validation/dev.csv   -- 
   test_files /DeepSpeechData/DeepSpeech/CSV/Test/test.csv   -- 
   alphabet_config_path /DeepSpeechData/DeepSpeech/data/alphabet.txt   -- 
   scorer_path /DeepSpeechData/DeepSpeech/deepspeech-0.9.3-models.scorer   
   --checkpoint_dir /DeepSpeechData/DeepSpeech/checkpoints_dir   -- 
   export_dir /DeepSpeechData/DeepSpeech/CSV/exports_dir   -- 
   train_batch_size 1   --test_batch_size 1   --n_hidden 2048   --epochs 
   200   --noshow_progressbar
   I Loading best validating checkpoint from 
   /DeepSpeechData/DeepSpeech/checkpoints_dir/best_dev-1466475
   I Loading variable from checkpoint: beta1_power
   I Loading variable from checkpoint: beta2_power
   I Loading variable from checkpoint: 
   cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias
   I Loading variable from checkpoint: 
  cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam
   Traceback (most recent call last):
   File "DeepSpeech.py", line 12, in <module>
   ds_train.run_script()
   File "/DeepSpeech/training/deepspeech_training/train.py", line 982, in 
   run_script
   absl.app.run(main)
   File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 300, in 
   run
   _run_main(main, args)
   File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 251, in 
   _run_main
   sys.exit(main(argv))
   File "/DeepSpeech/training/deepspeech_training/train.py", line 954, in 
   main
   train()
   File "/DeepSpeech/training/deepspeech_training/train.py", line 529, in 
   train
   load_or_init_graph_for_training(session)
   File "/DeepSpeech/training/deepspeech_training/util/checkpoints.py", 
   line 137, in load_or_init_graph_for_training
   _load_or_init_impl(session, methods, allow_drop_layers=True)
   File "/DeepSpeech/training/deepspeech_training/util/checkpoints.py", 
   line 98, in _load_or_init_impl
   return _load_checkpoint(session, ckpt_path, allow_drop_layers, 
   allow_lr_init=allow_lr_init)
   File "/DeepSpeech/training/deepspeech_training/util/checkpoints.py", 
   line 71, in _load_checkpoint
   v.load(ckpt.get_tensor(v.op.name), session=session)
   File "/usr/local/lib/python3.6/dist- 
   packages/tensorflow_core/python/pywrap_tensorflow_internal.py", line 
   915, in get_tensor
   return CheckpointReader_GetTensor(self, compat.as_bytes(tensor_str))
   tensorflow.python.framework.errors_impl.NotFoundError: Key 
  cudnn_lstm/rnn/multi_rnn_cell/cell_0/cudnn_compatible_lstm_cell/bias/Adam 
   not found in checkpoint

4th Try:

    root@b11bd0a278ee:/DeepSpeech# python -u DeepSpeech.py   --train_files 
    /DeepSpeechData/DeepSpeech/CSV/Training/training.csv   --dev_files 
    /DeepSpeechData/DeepSpeech/CSV/Validation/dev.csv   --test_files 
    /DeepSpeechData/DeepSpeech/CSV/Test/test.csv   --alphabet_config_path 
    /DeepSpeechData/DeepSpeech/data/alphabet.txt   --scorer_path 
    /DeepSpeechData/DeepSpeech/deepspeech-0.9.3-models.scorer   -- 
    checkpoint_dir /DeepSpeechData/DeepSpeech/checkpoints_dir   --export_dir 
    /DeepSpeechData/DeepSpeech/CSV/exports_dir   --train_batch_size 1   -- 
    test_batch_size 1   --n_hidden 2048   --epochs 200   -- 
    noshow_progressbar --use_cudnn_rnn
    
    FATAL Flags parsing error: Unknown command line flag 'use_cudnn_rnn'
    Pass --helpshort or --helpfull to see help on flags.

5th Try: added --train_cudnn flag but the output was nothing:

    root@0123a1149260:/DeepSpeech# python -u DeepSpeech.py \ --train_files 
    /DeepSpeechData/DeepSpeech/CSV/Training/training.csv \ --dev_files 
    /DeepSpeechData/DeepSpeech/CSV/Validation/dev.csv \ --test_files 
    /DeepSpeechData/DeepSpeech/CSV/Test/test.csv \ alphabet_config_path 
    /DeepSpeechData/DeepSpeech/data/alphabet.txt \ --scorer_path 
    /DeepSpeechData/DeepSpeech/deepspeech-0.9.3-models.scorer \ -- 
    checkpoint_dir 
    /DeepSpeechData/DeepSpeech/checkpoints_dir \ --export_dir 
    /DeepSpeechData/DeepSpeech/CSV/exports_dir \ --train_batch_size 1 \ -- 
     test_batch_size 1 \ --n_hidden 100 \ --epochs 200 \ 
    --noshow_progressbar --train_cudnn

    root@0123a1149260:/DeepSpeech#

Question:

What could be causing this error in my setup?
Are there specific considerations or best practices when setting up DeepSpeech training in a Docker environment that I might be missing?

Any insights or suggestions to resolve this error would be greatly appreciated.

Original Q&A

TechQA.

Error Training Deepspeech inside Docker with Ubuntu 20.04 integration on Windows 10 (Nvidia Gpu Rtx 3090)

There are 0 answers

Related Questions in DOCKER

Related Questions in TENSORFLOW

Related Questions in GPU

Related Questions in MOZILLA-DEEPSPEECH

Popular Questions

Popular Tags

Trending Questions