python pdf2image "May not be a PDF file" error

1.2k views Asked by At

On Centos 8 operating system, I get an error when converting pdf pages to jpg files with Python.

from pdf2image import convert_from_path
import sys

images = convert_from_path("test.pdf",500)
for i in range(len(images)):
    images[i].save('page'+ str(i) +'.jpg', 'JPEG')

As a result it gives this error. I can run the PDF file locally, but it doesn't work when I want to save it as a jpg.

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/pdf2image/pdf2image.py", line 479, in pdfinfo_from_path
    raise ValueError
ValueError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "pdf_conv.py", line 7, in <module>
    images = convert_from_path(pdf_path,500)
  File "/usr/local/lib/python3.6/site-packages/pdf2image/pdf2image.py", line 98, in convert_from_path
    page_count = pdfinfo_from_path(pdf_path, userpw, poppler_path=poppler_path)["Pages"]
  File "/usr/local/lib/python3.6/site-packages/pdf2image/pdf2image.py", line 489, in pdfinfo_from_path
    "Unable to get page count.\n%s" % err.decode("utf8", "ignore")
pdf2image.exceptions.PDFPageCountError: Unable to get page count.
Syntax Warning: May not be a PDF file (continuing anyway)
Syntax Error: Couldn't find trailer dictionary
Syntax Error: Couldn't find trailer dictionary
Syntax Error: Couldn't read xref table
1

There are 1 answers

0
Patrick Artner On

PDF != PDF - there are different Versions of it. Mayhap your python pdf2image does not like/know the kind of PDF you feed it. Use AcrobatReader or something alike to check what you are trying to convert and see if pdf2image supports it.

See Which ISO standards does pdf2image support (short: pdf2image supports all PDF standards that poppler supports.)