OSError: Tesseract not found in environment. Check variables and PATH

54 views Asked by At

I'm trying to use img2table to read the data in scanned images. I'm starting with a basic example and I keep getting an OSError: Tesseract not found in environment. Check variables and PATH.

Here's my code:

from img2table.document import Image
from img2table.ocr import TesseractOCR

img = Image("mark sheet.jpg")
tesseract = TesseractOCR()

# Extract tables with Tesseract and PaddleOCR
tables = img.extract_tables(ocr=tesseract, borderless_tables=True)

tables[0].df

Here's the error: File

"C:\Users\PC\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.11_qbz5n2kfra8p0\LocalCache\local-packages\Python311\site-packages\img2table\ocr\tesseract.py", line 56, in __init__
    raise EnvironmentError("Tesseract not found in environment. Check variables and PATH")

It seems like the library I installed isn't finding something in some environment. I have no idea how to proceed.

Please help.

1

There are 1 answers

0
gbiz123 On

You are getting that error because Python cannot find the Tesseract executable in your runtime path. Tesseract is an external OCR library, outside of the pytesseract python library. pytesseract simply acts as a wrapper to interact with the Tesseract library. So in addition to installing pytesseract with pip, you need to download the Tesseract OCR executable and add it to your PATH so Python can find it.

This guide seems to outline the whole process pretty well: https://ironsoftware.com/csharp/ocr/blog/ocr-tools/tesseract-ocr-windows/

Here are the official installation docs: https://tesseract-ocr.github.io/tessdoc/Installation.html