Tesseract is giving junk data as an output for Japaneses language

Question

Tesseract is giving junk data as an output for Japaneses language

686 views Asked by Aditya At 04 September 2017 at 14:04

I'm trying to build a sample application in java for Japaneses language that will read an image file and just output the text extracted from the image. I found one sample application on net which is running perfect for English Language but not for Japanees it is giving unidentified text, following is my code:

BytePointer outText;

    TessBaseAPI api = new TessBaseAPI();
    // Initialize tesseract-ocr with japanees, without specifying tessdata path
    if (api.Init(".", "jpn") != 0) {
        System.err.println("Could not initialize tesseract.");
        System.exit(1);
    }

    // Open input image with leptonica library
    PIX image = pixRead("test.png");
    api.SetImage(image);
    // Get OCR result
    outText = api.GetUTF8Text();
    String string = outText.getString();
    assertTrue(!string.isEmpty());
    System.out.println("OCR output:\n" + string);

    // Destroy used object and release memory
    api.End();
    outText.deallocate();
    pixDestroy(image);

my output is: OCR output: ETCã‚«ãƒ¼-ãƒ¼ãƒ‰ç”³ è¾¼æ›¸ ã?Šç”³ã?—è¾¼ã?¿æ—¥ 09/02/2017 ETC FeatureID ETCFFL ãƒ¼ç”³è¾¼æžšè¼©äº¤ ç”» æžš

i has used jpn.tessdata and my application is reading tessdata file also. is any more configration needed? i'm using Tessaract 3.02 version with very clean image.

Original Q&A

There are 1 answers

**Aditya** · Answer 1 · 2017-09-12T09:52:08+00:00

Yes! i got the solution, what we need to do is to set the locale in our java code as follows: olocale = new Locale.Builder().setLanguage("ja").setRegion("JP").build(); we can set locale for English language also in order to extract both Japanese as well as English text from Image.

now it is working like charm for me!!

TechQA.

Tesseract is giving junk data as an output for Japaneses language

There are 1 answers

Related Questions in OCR

Related Questions in TESSERACT

Related Questions in PYTHON-TESSERACT

Related Questions in TESS4J

Popular Questions

Popular Tags

Trending Questions