issue on creating language model for sinhala usin SRILM

112 views Asked by At

I'm trying to create a sinhala voice recognition system using pocketsphinx. I use SRILM tool to create language model. My source files to create the laguage model are Here . Im using cygwin on windows 8.1 to run SRILM 1.7.1. But once i run the command

ngram-count -vocab sinhalalexicon.txt -text sinhalacorpus.Train -order 3     -write sinhala.count -unk

I'm getting

iconv: Invalid or incomplete multibyte or wide character
iconv: Invalid or incomplete multibyte or wide character

What did I do wrong here? sinhalacorpus.Train file was created by manually using Notepad++

1

There are 1 answers

0
dab1984 On

I found the solution to my issue. once I convert the corpus and lexicon files to Unix format and change the encoding to UTF-8 without BOM it worked. I used Notepad++ to do the changes.