Tessnet2 (Tesseract) is not returning the correct results - ways to improve output

781 views Asked by At

I am using tessnet2 (tesseract-ocr) in C# on following image:

Sample text image

This is my code:

var image = new Bitmap(@"D:\anuj\a2.jpg");
ocr.Init(@"D:\anuj\OCRTest\tessdata", "eng", false);
var result = ocr.DoOCR(image, Rectangle.Empty);
foreach (Word word in result)
    Console.Write("{0} ", word.Text);                    
Console.ReadLine();

which gives output: Icurumcretz j

What are ways to get proper resulted text as sample image is pretty clear and of good resolution and still not giving proper text. What are the parameters that need to defined to get correct result. Please reply.

1

There are 1 answers

0
Janco de Vries On BEST ANSWER

You should try and some image processing on your image to improve your output of tesseract. OpenCV(EmguCV for C# I think) libraries can help you do some of those image processing methods. I used a small medianBlur on the image to reduce the noise and made a binary image out of it.

Segmented Image

Testing this image with tesseract gives me the following output: laurumoretz and some gibberish on the next line because I did not remove small blobs(characters from the sticker with the phonenumbers). So it's off by one but I did not use a correction to make the text appear fully lineair.

I hope this will give you a bit of an idea on how to improve the output of tesseract-ocr.