How to augment OCR output of tesstwo using user data in ANDROID?

608 views Asked by At

I am using Tess-two to OCR documents.It seems that the team of Tesseract has done a great job and the results are extremely good!!
But now I want to use only words from my user_data file.
Here https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc#config-files-and-augmenting-with-user-data is a very good example. I have found every file in my android project that the link of Tesseract suggests but I cannot make the program use the words from user_data file. I have found the .bazaar file in the configs folder but how I set this in my code??
Is there something that I miss?

Below is the part of the code that I initialize the tessbaseApi and set the commands.

    TessBaseAPI baseApi = new TessBaseAPI();
    baseApi.setDebug(true);

    baseApi.init(Environment.getExternalStorageDirectory() + "/EMB/dataBase/", "eng");

    baseApi.setPageSegMode(TessBaseAPI.OEM_TESSERACT_CUBE_COMBINED);
    baseApi.setPageSegMode(TessBaseAPI.PageSegMode.PSM_AUTO_OSD);
    baseApi.setPageSegMode(TessBaseAPI.PageSegMode.PSM_SINGLE_LINE);


    baseApi.ReadConfigFile("/path/to/configs/bazaar");      

    baseApi.setImage(myBitmap);
    //variable for recognizing

    String recognizedText = baseApi.getUTF8Text();
    /*recognizedText = recognizedText.replaceAll(blackList, "");//remove space*/
    String resultTxt = recognizedText;
    //
    baseApi.end();
    ocrreadytext.setText(resultTxt);

Thanks in advance!

1

There are 1 answers

0
rmtheis On