Adding only user-words to Tesseract

Question

Adding only user-words to Tesseract

3.4k views Asked by MKH At 14 December 2016 at 09:30

I am using Tesseract in my android application. I defined my "user-words" file and I added the bold line for ocr to consider user-words file.

String language = "deu";
datapath = getFilesDir()+ "/tesseract/";
Tess = new TessBaseAPI();

checkFile(new File(datapath + "tessdata/"));
**Tess.setVariable("user_words_suffix","deu.user-words");**
Tess.init(datapath, language);

I did not define an user-patterns file , since there is not any specific pattern in my images. I just copy the UTF-8 txt file of due.user-words in the tessdata folder. Is this enough for ocr configuration ? or Should I unpack due_traindata and add this file to due_traindata and then pack it? if yes can you give me some hint on how to do that.

Original Q&A

There are 1 answers

**nguyenq** · Accepted Answer · 2016-12-16T04:17:12+00:00

You don't need to specify the language prefix in the code:

Tess.setVariable("user_words_suffix", "user-words");

Make sure the file's prefix matches the specified language code -- namely, deu.user-words.

https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc https://github.com/tesseract-ocr/tesseract/wiki/ControlParams

TechQA.

Adding only user-words to Tesseract

There are 1 answers

Related Questions in ANDROID

Related Questions in OCR

Related Questions in TESSERACT

Related Questions in CONFIG-SPEC

Popular Questions

Popular Tags

Trending Questions