Here's the gist of the code:
import fitz
import easyocr
from PIL import Image
def extract_text_from_pdf(pdf_path):
reader = easyocr.Reader(['en'], download_enabled=False)
extracted_text = ""
for page_number in range(pdf_document.page_count):
page = pdf_document[page_number]
resolution = 300
zoomfactor = resolution/72.0
pixmap = page.get_pixmap(matrix-fitz.Matrix(zoomfactor, zoomfactor))
image = pixmap.tobytes()
result = reader.readtext(image, paragraph=True)
print("Page {page_number + 1} - OCR Result:")
for detection in result:
extracted_text += detection[1]
pdf_document.close()
return extracted_text
The image passed looks something like this:
But the extracted text looks like this: "account: 1234url: xyz"
The expectation is:
"account: 1234
url: xyz"
It seems like easyOCR is extracting each word separately and not reading the image line by line. Probably because there's a huge space between the words on a single line.
Can you please suggest something?

According to the documentation you can specify Bounding Box Merging.
Modifying one of these should do the trick
Update:
According to the op, setting
x_thsto 1000.0 did solve the issue.