I am trying to rebuild an update a script, which finds all images in a pdf and finally saves them. I am able to make out all the images in pdf via pdfrw, but when I want to make them a Pillow Obj I get an Error. I tried different approaches. I know that Image.open() wants a binary-obj.
This is the line I can't resolve:
im = Image.open(image)
Does some one have an idea how this bridge could be built?
Error: Traceback (most recent call last):
File "procPDF.py", line 35, in <module>
find_images(reader)
File "procPDF.py", line 29, in find_images
find_images(item, visited)
File "procPDF.py", line 29, in find_images
find_images(item, visited)
File "procPDF.py", line 29, in find_images
find_images(item, visited)
[Previous line repeated 4 more times]
File "procPDF.py", line 25, in find_images
process_image(obj)
File "procPDF.py", line 13, in process_image
im = Image.open(image)
File "/Users/eriknorden/Dokumente/Working-Space/230831 PDF Project/.venv/lib/python3.8/site-packages/PIL/Image.py", line 3222, in open
fp.seek(0)
TypeError: 'NoneType' object is not callable
This is the code
import sys
import os
import zlib
from PIL import Image
#from io import StringIO, BytesIO
from pdfrw import PdfReader, PdfDict, PdfArray, PdfName, PdfWriter
def process_image(image):
if image["/Filter"] == PdfName("FlateDecode"):
return
elif image["/Filter"] == PdfName("DCTDecode"):
im = Image.open(image)
def find_images(obj, visited=set()):
if not isinstance(obj, (PdfDict, PdfArray)):
return
myId = id(obj)
if myId in visited:
return
visited.add(myId)
if isinstance(obj, PdfDict):
if obj.Type == PdfName.XObject and obj.Subtype == PdfName.Image:
process_image(obj)
obj = obj.itervalues()
for item in obj:
find_images(item, visited)
if __name__ == '__main__':
inpfn,outfn = sys.argv[1:]
reader = PdfReader(inpfn)
find_images(reader)