Image preprocessing for egg marking recognition with Tesseract

Question

Image preprocessing for egg marking recognition with Tesseract

587 views Asked by Tomasito665 At 31 August 2017 at 10:48

The goal is to make an app which can recognize egg markings, for example 0-DE-134461. I tried both Tesseract and the Google Vision API on the following images. The results from both OCR engines are disastrous.

0-DE-46042

Tesseract → ""
Google Vision API → " 2 "

3-ES08234-25591

Tesseract → ""
Google Vision API → " Es1234-2SS ) R SHAH That is part "

Cropped

I manually cropped the images with Photoshop.

0-DE-46042

Tesseract → ""
Google Vision API → ""

3-ES08234-25591

Tesseract → "3ΓÇöE503ΓÇÿ234-gg"
Google Vision API → " -ESOT23-2559 ) "

Thresholded

I color-selected the text on both eggs manually with Photoshop and removed the background.

0-DE-46042

Tesseract → "OΓÇöDEΓÇö46042"
Google Vision API → " O-DE-46042 "

3-ES08234-25591

Tesseract → ""
Google Vision API → " 3-ESO8234-9 "

Removing the circular warp?

I would assume that the last preprocessing step should be removing the circular warp, but I wouldn't know how to do that manually using Photoshop, let alone automating that.

My questions

Am I heading in the right direction?
Are my preprocessing steps correct?
What would be the approach to automate these steps in, say, OpenCV?

Extra info

The command I used to get the tesseract OCR results:

λ tesseract {egg_picture}.jpg --psm 7 stdout

The tesseract version:

λ tesseract --version
tesseract 4.0.0-alpha.20170804
 leptonica-1.74.4
  libgif 4.1.6(?) : libjpeg 8d (libjpeg-turbo 1.5.0) : libpng 1.6.20 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.3 : libopenjp2 2.1.

Platform: Windows 10

Edit 1

I applied adaptive thresholding on some egg marking images with OpenCV. These are the results so far:

IMG A (edit 1) IMG C (edit 1) IMG B (edit 1) IMG H (edit 1) IMG D (edit 1) IMG E (edit 1) IMG G (edit 1) IMG I (edit 1)

However, there's still lots of noise. I'm struggling to adjust the parameters so that it works well across different images.

Original Q&A

There are 1 answers

**Jeru Luke** · Answer 1 · 2017-09-01T14:01:22+00:00

I have a suggestion.

I tried applying local histogram equalization for all the three channels in the BGR color space and then merged them.

Result:

With the details in the image more enhanced you can think about preprocessing on these images.

I also tried globally equalizing the histogram of the three channels separately. The images although clear than the original, lacked the depth in detail.

TechQA.

Image preprocessing for egg marking recognition with Tesseract

Cropped

Thresholded

Removing the circular warp?

My questions

Extra info

Edit 1

There are 1 answers

Related Questions in OPENCV

Related Questions in OCR

Related Questions in TESSERACT

Related Questions in GOOGLE-VISION

Popular Questions

Popular Tags

Trending Questions