How Can I evaluate WER (Word Error Rate) in ASR ( Automatic Speech Recognition)?

11.6k views Asked by At

How Can I evaluate WER (Word Error Rate) in ASR ( Automatic Speech Recognition)?

For Example, If I have (Human Ref. Translation) for the sentences and Output of ASR.

I know the equation but I do not know how to calculate it. Do I enter punctuation marks such as a comma and full stop and so on in the calculation of WER?

As well as for sub, ins, and del of words. Is there a specific weight? Each of them when calculated in the equation?

Would anyone who knows about how can we calculate WER for ASR.. please give me an example so I can calculate WER for ASR in multiple sentences that I have in my app

4

There are 4 answers

1
Nikolay Shmyrev On

Do I enter punctuation marks such as a comma and full stop and so on in the calculation of WER?

You strip punctuation before calculation and convert everything to lowercase.

Would anyone who knows about how can we calculate WER for ASR.. please give me an example so I can calculate WER for ASR in multiple sentences that I have in my app

You can use this Python package:

https://pypi.org/project/jiwer/

if you need other languages, let us know which ones.

0
Alok Prasad On

a simple C++ implementation Based on Levenshtein distance, just a single file without any library.

https://github.com/alokprasad/asr-wer

0
Kenny On

Refer to this repository to calculate the Word Error Rate (WER) of two strings with Colab.

You can also remove the punctuation when calculate WER by tick on the remote_punctuation checkbox.

enter image description here

Word Error Rate Visualization with Colab: https://github.com/duckyngo/Word-Error-Rate-Visualization-with-Colab

0
Naval On

Word Error Rate is calculated based on formula:

Word Error Rate = (Substitutions + Insertions + Deletions) / Number of Words Spoken

If we analyze this is very simple, we have first get the total number of Insertions, Deletions and substitutions in the ASR output by comparing with the actual transcript data (ground truth data). Now Insertion can happen at character level or word level or can be a combination of many characters, similarly deletion can also happen at multiple characters level, similarly in place of characters or words, new characters can also be wrongly inferenced which is basically substitution error.

Now the question arises how to identify these types of errors, for identifying these errors. For this Levenshtein distance metric is used.

The Levenshtein distance is a measurement of the differences between two “strings.” The strings are sequences of letters that make up the words in a transcription.

Let’s look at some examples for better understanding

  1. “happy” and “gappy.” Here just a single letter is changed, the Levenshtein distance is only 1.
  2. For “cat” and “kake tea,” since in transcription “ca” becomes “kake” by 1 substitution, 2 insertions, and “t” becomes “tea” by adding 2.So the levenshtein distance is 5 here.

After this get the total number of words and get the division done, this will give you WER