Improve Tesseract Accuracy on my book images, for Google Books API

47 views Asked by At

I am trying to upload my library to goodreads, so I am taking a picture to get the details of each book and then I use Google books api to get the isbn number.

Unfortunately, pytesseract is not detecting the text properly.

Here is one image: enter image description here

The result after thresholding:

thresh

and here is my code:

import pytesseract
from pytesseract import Output
import cv2
import pandas as pd


#We then read the image with text
img=cv2.imread(image_path)
original = img.copy()
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (3,3), 0)
# threshold the image, setting all foreground pixels to
# 255 and all background pixels to 0
thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
text = pytesseract.image_to_data(thresh, lang='eng',  output_type='data.frame')

but the results are far from great. I have tried to deskew the images and process them differently but the results are not better.

Is there a way to improve the image processing or another library method I should try that might work better?

0

There are 0 answers