KerasOCR, EasyOCR, Pytesseract not able to recognize simple numbers

261 views Asked by At

I was trying to run OCRs on my single character images, all being numbers. I tried running EasyOCR, Keras-OCR and Pytesseract on the image, but none returned proper output. I even tried MNIST, but still the output was wrong, (it said 5 as an output instead of 7).

What should I do ? Images included.

Image of character '7'

Image of 7

Image of character '9'

Image of 9

What approaches should I add if any for preprocessing? Or should I do something else?

1

There are 1 answers

0
EL Amine Bechorfa On

Every OCR is trained on a a different type of images take a look at this article to summarize, "Tesseract" is performing well for high-resolution images. Certain morphological operations such as dilation, erosion, OTSU binarization can help increase pytesseract performance.

"EasyOCR" is lightweight model which is giving a good performance for receipt or PDF conversion. It is giving more accurate results with organized texts like pdf files, receipts, bills.

"Keras-OCR" is image specific OCR tool. If text is inside the image and their fonts and colors are unorganized, Keras-ocr gives good results.

I recommend Tesseract for this kind of digits, and if the problem persists try to create your own dataset (or search for an existing one) of digits and fine-tune an existing model.

otherwise, try to inference an entire paragraphs and words not a single digit.

For Easyocr if you want to get just digits try the parameter reader.readtext(image, allowlist=(123456789))