How do I convert a multiple paged PDF into a PNG image per pdf page in Python

1.3k views Asked by At

Amateur Python developer here. I'm working on a project where I take multiple PDfs, each one with varying amounts of pages(1-20ish), and turn them into PNG files to use with pytesseract later.

I'm using pdf2image and poppler on a test pdf that has 3 pages. The problem is that it only converts the last page of the PDF to a PNG. I thought "maybe the program is making the same file name for each pdf page, and with each iteration it rewrites the file until only the last pdf page remains" So I tried to write the program so it would change the file name with each iteration. Here's the code.

from pdf2image import convert_from_path
images = convert_from_path('/Users/jacobpatty/vscode_projects/badger_colors/test_ai/10254_Craigs_Plumbing.pdf', 200)

file_name = 'ping_from_ai_test.png'
file_number = 0
for image in images:
    file_number =+ 1
    file_name = 'ping_from_ai_test' + str(file_number) + '.png'
    image.save(file_name)

This failed in 2 ways. It only made 2 png files('ping_from_ai_test.png' and 'ping_from_ai_test1.png') instead of 3, and when I clicked on the png files they were both just the last pdf page again. I don't know what to do at this point, any ideas?

2

There are 2 answers

0
IncredibleReinforcement On BEST ANSWER

Your code is only outputting a single file as far as I can see. The problem is that you have a typo in your code.

The line

file_number =+ 1

is actually an assignment:

file_number = (+1)

This should probably be

file_number += 1

0
JHW On

try this instead of doing for image in images:

for n in range(len(images)):
    images[n].save('test' + str(n) + '.png')

Does that work?