I am trying to extract few fields from OCR image. I am using pytesseract to read OCR image file and this is working as expected.
import pytesseract from PIL import Image import re pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract- OCR\tesseract.exe" value = Image.open("ocr.JPG") text = pytesseract.image_to_string(value) print(text)
ALS 1 Emergency Base Rate Y A0427 RE ABC Anbulance Mileage Charge Y A0425 RE ABC Disposable Supplies Y A0398 RH ABC 184800230, x
Next, I have to extract A0427 and A0425 from the text.. but the problem is I am not loop through the whole line.. it's taking one character at a time and that's why my regular expression isn't working..
for line in text : print(line) x= re.findall(r'^A[0-9][0-9][0-9][0-9]', text) print(x)