Voicexml how many words in grammar

248 views Asked by At

I want to have a dynamic grammar in my voicexml file (read single products and create the grammar with php)

my question is, if there is any advice or experience how many words should be writte into the source from where I read the products. I don't know much about the structure or pronunciation of the words, so let's say

a) the words are rather different from each other b) the words rather have the same structre or pronunciation c) a mix of a) and b)

thanks in advance

1

There are 1 answers

0
Jim Rush On BEST ANSWER

I'm assuming you mean SRGS grammars when you indicate a dynamic grammar for VoiceXML.

Unfortunately, you're going to have to do performance testing under a reasonable load to really know for sure. I've successfully transmitted 1M+ grammars under certain conditions. I've also done 10,000 name lists. I've also come across platforms that can only utilize a few dozen entries.

The speech recognition (ASR) and VoiceXML platform are going to have a significant impact on your results. And, the number of concurrent recognitions with this grammar will also be relevant along with the overall recognition load.

The factors you mention do have an impact on recognition performance and cpu load, but I've typically found size of grammar and length/variability of entries to matter more. For example, yes/no grammars typically have a much higher cpu load then complex menu grammars (short phrases tend to require more passes and leave open a larger number of possibilities when processing). I've seen some horrible numbers from wide ranging digit grammars (9-31 digit grammars). The sounds are short and difficult to disambiguate. The variability in components, again, creates large number of paths that have to be continuously checked for a solution. Most menu or natural speaking phrases have longer words that sound significantly different so that many paths can be quickly excluded.

Some tips:

Most enterprise class ASR systems support a cache. If you can identify grammars with URL parameters and set any HTTP header information the ASR needs (don't assume they follow the standards), you may see a significant performance boost.

Prompts can often hide grammar loading/compiling phases. If you have a relatively long prompt where people will tend to barge in, you'll find that you can hide some fairly large grammar fetches. Again, not all platforms do a good job of processing these tasks in parallel. Note, most ASR engines can collect audio and perform end-pointing, while still fetching and compiling the grammar. This buys you more time, but you'll see the impact in longer latencies.

Most ASR engines provide tools that let you analyze a grammar with sample audio. The tools will usually give you a cpu resource indicators. I've rarely found that you can calculate/predict overall performance due to the complexities around recognition concurrency, but they can give you a comparative impact with other grammars. I have yet to find an engine that makes it easy to track grammar processing times, it can be difficult to even roughly guess concurrency challenges. In most cases, large scale testing has been necessary.

After grammar load/compile times, recognition concurrency is the most significant performance impact. I've seen a few applications that have highly complex grammars near the beginning of the call. There were high levels of recognition concurrency without an opportunity to cache (platform issue at the time), which lead to scaling challenges (intermittent, large latencies in recognition processing).