Why building a new scorer outputs an empty string for deepspeech 0.9.3

52 views Asked by jbflow At 29 November 2023 at 19:46

I am trying to create a limited corpus and train a language model to use for a deepspeech scorer.

I have followed the information provided in the docs here

I read a helpful guide posted for an older version of deepspeech for generating a language model here

And I have read the playbook here,

It seems that this has been encountered before, but no answer was given there

I have set up the docker environment for training and followed the docs to the letter.

I can train a model, and then convert this to a .scorer file, so the whole process is working.

The steps I take from inside the docker container are:

create a vocab.txt file with my input sentences and store it in the deepspeech-data-input folder.
run this script to build the model in the outputpython3 generate_lm.py --input_txt ../../deepspeech-data/input/vocab.txt --output_dir ../../deepspeech-data/output --top_k 100 --kenlm_bins /DeepSpeech/native_client/kenlm/build/bin/ --arpa_order 5 --max_arpa_memory "85%" --arpa_prune "0|0|0|0" --binary_a_bits 255 --binary_q_bits 8 --binary_type trie --discount_fallback
run this script to generate the scorer: ./generate_scorer_package --alphabet ../../deepspeech-data/input/alphabet.txt --lm ../../deepspeech-data/output/lm.binary --vocab ../../deepspeech-data/output/vocab-100.txt --package ../../deepspeech-data/output/deepspeech-0.9.3-models.scorer --default_alpha 0.9 --default_beta 0.9 --force_bytes_output_mode 1
replace the default scorer with my one.
Run deepspeech

Everything seems to work as it should, no errors or anything, but when doing this deepspeech just detects an empty string. If I use the default scorer I have it working fine, but I need to restrict the vocabulary so that I can just detect a few commands.

I have tried adjusting some of the flags, but I always get the same result.

I am using the --discount_fallback flag as suggested as it is a small corpus

So my question is this. Why would a deepspeech language model/scrorer output an empty string and how can I fix it?

I am running this inside the NodeJS example on github but testing against any of them would work to reproduce. github examples

Original Q&A

TechQA.

Why building a new scorer outputs an empty string for deepspeech 0.9.3

There are 0 answers

Related Questions in PYTHON

Related Questions in NODE.JS

Related Questions in NLP

Related Questions in MOZILLA-DEEPSPEECH

Popular Questions

Popular Tags

Trending Questions