How Can I evaluate WER (Word Error Rate) in ASR ( Automatic Speech Recognition)?

Question

How Can I evaluate WER (Word Error Rate) in ASR ( Automatic Speech Recognition)?

11.6k views Asked by Randa Taiseer At 01 December 2019 at 15:19

For Example, If I have (Human Ref. Translation) for the sentences and Output of ASR.

I know the equation but I do not know how to calculate it. Do I enter punctuation marks such as a comma and full stop and so on in the calculation of WER?

As well as for sub, ins, and del of words. Is there a specific weight? Each of them when calculated in the equation?

Would anyone who knows about how can we calculate WER for ASR.. please give me an example so I can calculate WER for ASR in multiple sentences that I have in my app

Original Q&A

There are 4 answers

**Nikolay Shmyrev** · Answer 1 · 2019-12-08T23:16:10+00:00

Do I enter punctuation marks such as a comma and full stop and so on in the calculation of WER?

You strip punctuation before calculation and convert everything to lowercase.

Would anyone who knows about how can we calculate WER for ASR.. please give me an example so I can calculate WER for ASR in multiple sentences that I have in my app

You can use this Python package:

https://pypi.org/project/jiwer/

if you need other languages, let us know which ones.

**Alok Prasad** · Answer 2 · 2020-02-12T14:58:41+00:00

Alok Prasad On 12 February 2020 at 14:58

a simple C++ implementation Based on Levenshtein distance, just a single file without any library.

https://github.com/alokprasad/asr-wer

**Kenny** · Answer 3 · 2022-07-23T12:43:30+00:00

Refer to this repository to calculate the Word Error Rate (WER) of two strings with Colab.

You can also remove the punctuation when calculate WER by tick on the remote_punctuation checkbox.

Word Error Rate Visualization with Colab: https://github.com/duckyngo/Word-Error-Rate-Visualization-with-Colab

**Naval** · Answer 4 · 2024-01-10T05:01:42+00:00

Word Error Rate is calculated based on formula:

Word Error Rate = (Substitutions + Insertions + Deletions) / Number of Words Spoken

If we analyze this is very simple, we have first get the total number of Insertions, Deletions and substitutions in the ASR output by comparing with the actual transcript data (ground truth data). Now Insertion can happen at character level or word level or can be a combination of many characters, similarly deletion can also happen at multiple characters level, similarly in place of characters or words, new characters can also be wrongly inferenced which is basically substitution error.

Now the question arises how to identify these types of errors, for identifying these errors. For this Levenshtein distance metric is used.

The Levenshtein distance is a measurement of the differences between two “strings.” The strings are sequences of letters that make up the words in a transcription.

Let’s look at some examples for better understanding

“happy” and “gappy.” Here just a single letter is changed, the Levenshtein distance is only 1.
For “cat” and “kake tea,” since in transcription “ca” becomes “kake” by 1 substitution, 2 insertions, and “t” becomes “tea” by adding 2.So the levenshtein distance is 5 here.

After this get the total number of words and get the division done, this will give you WER

TechQA.

How Can I evaluate WER (Word Error Rate) in ASR ( Automatic Speech Recognition)?

There are 4 answers

Related Questions in SPEECH-RECOGNITION

Related Questions in SPEECH-TO-TEXT

Related Questions in PERFORMANCE-MEASURING

Popular Questions

Popular Tags

Trending Questions