I am reading an OCR image file and converting it to text. Now, I need to extract one specific text.

Generated text (Not complete text):

FROM: 2902 W SWEETWATER AV #1100
Phoenix, AZ 95029

TO: BANNER THUNDERBIRD MED CTR
5855 W THUNDERBIRD RD
Glendale, AZ 85307

c9 23 1975 x

I need to extract 95029 under FROM: segment. I was thinking to get the line number of FROM: segment first then do plus one to get the next line number and apply regular expression to retrieve text.But, I am not able to text of next line number.

for num, line in enumerate(text.splitlines()):
    if 'FROM:' in line:
        num = num+1
        print(num)
        break
#print(line)

I am able to get line number but not text. Please suggest.

2 Answers

1
Rajan On Best Solutions

Save the splitted text list into a variable, text_list = text.splitlines() later you can access the next line by using text_list[num+1]

Try Something like this:

text = """FROM: 2902 W SWEETWATER AV #1100
Phoenix, AZ 95029

TO: BANNER THUNDERBIRD MED CTR
5855 W THUNDERBIRD RD
Glendale, AZ 85307

c9 23 1975 x"""



desired_line = ''
text_list = text.splitlines()

for num, line in enumerate(text_list):
    if 'FROM:' in line:
        desired_line = text_list[num+1]
        break

print(desired_line) # prints desired line
print(desired_line.split()[-1]) # prints Number you seeked
0
Peipei On

You can also do it through a regular expression if the From address has the same format. Similar regex can also be applied to find the To address and the zip code.

text = """FROM: 2902 W SWEETWATER AV #1100
Phoenix, AZ 95029

TO: BANNER THUNDERBIRD MED CTR
5855 W THUNDERBIRD RD
Glendale, AZ 85307 

c9 23 1975 x"""

import re
res=re.search("FROM:.*\n(([a-zA-Z]+),\s*([A-Z]{2})\s+(\d{5})\n)",text)
if res is not None:
    print(res.group(0)) ## From address
    print(res.group(1)) ## city, state zip----Phoenix, AZ 95029
    print(res.group(4)) ## zip---95029