Adding only user-words to Tesseract

3.4k views Asked by At

I am using Tesseract in my android application. I defined my "user-words" file and I added the bold line for ocr to consider user-words file.

String language = "deu";
datapath = getFilesDir()+ "/tesseract/";
Tess = new TessBaseAPI();

checkFile(new File(datapath + "tessdata/"));
**Tess.setVariable("user_words_suffix","deu.user-words");**
Tess.init(datapath, language);

I did not define an user-patterns file , since there is not any specific pattern in my images. I just copy the UTF-8 txt file of due.user-words in the tessdata folder. Is this enough for ocr configuration ? or Should I unpack due_traindata and add this file to due_traindata and then pack it? if yes can you give me some hint on how to do that.

1

There are 1 answers

0
nguyenq On BEST ANSWER

You don't need to specify the language prefix in the code:

Tess.setVariable("user_words_suffix", "user-words");

Make sure the file's prefix matches the specified language code -- namely, deu.user-words.

https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc https://github.com/tesseract-ocr/tesseract/wiki/ControlParams