Retrieval from the database with Sphinx4

186 views Asked by At

Now I'm in the process of making a dictionary application using voice. I have made this dictionary and there are about 100000 words as database. This dictionary needs to be searched by voice. For this, I use Sphinx4 / cmusphinx as a tool to be used. I've read references to related websites and successfully run the application samples. Then i implement same methodology in the this sample (HelloWorld) into my dictionary. Previously, I have already put 100000 words in the grammar (.gram). When I try to run it, my dictionary becomes frozen and after 5 minutes later, eclipse show "Java Heap Size Out of Memory"

configuration of grammar

#JSGF V1.0;
grammar hello;
public <database> = ([<Words>])*;
<Words>= 100000 words split by "|"

For sphinx4, i used this version http://sourceforge.net/projects/cmusphinx/files/sphinx4/1.0%20beta6/

Is my method to implement voice speech in my dictionary correct?

Is there any good references for building such search engine with a large database of words (approximately 100000 words)?

Hope you could help me.

1

There are 1 answers

3
Nikolay Shmyrev On

The approach is ok.

If you do not have enough memory for JVM, you can increase it with -Xmx option

For the accurate retrieval it's better to create a unigram language model with frequencies of the words, not just a plain list. See for details

http://cmusphinx.sourceforge.net/wiki/tutoriallm

For the best accuracy it's better to use latest high-level API, see for details

http://cmusphinx.sourceforge.net/wiki/sphinx4