I'm trying to create a sinhala voice recognition system using pocketsphinx. I use SRILM tool to create language model. My source files to create the laguage model are Here . Im using cygwin on windows 8.1 to run SRILM 1.7.1. But once i run the command
ngram-count -vocab sinhalalexicon.txt -text sinhalacorpus.Train -order 3 -write sinhala.count -unk
I'm getting
iconv: Invalid or incomplete multibyte or wide character
iconv: Invalid or incomplete multibyte or wide character
What did I do wrong here? sinhalacorpus.Train file was created by manually using Notepad++
I found the solution to my issue. once I convert the corpus and lexicon files to Unix format and change the encoding to UTF-8 without BOM it worked. I used Notepad++ to do the changes.