How to get an image from pdfrw to be readable for Pillow

45 views Asked by Erikson_Nordenson At 01 September 2023 at 19:12

I am trying to rebuild an update a script, which finds all images in a pdf and finally saves them. I am able to make out all the images in pdf via pdfrw, but when I want to make them a Pillow Obj I get an Error. I tried different approaches. I know that Image.open() wants a binary-obj.

This is the line I can't resolve:

im = Image.open(image)

Does some one have an idea how this bridge could be built?

Error: Traceback (most recent call last):

  File "procPDF.py", line 35, in <module>
    find_images(reader)
  File "procPDF.py", line 29, in find_images
    find_images(item, visited)
  File "procPDF.py", line 29, in find_images
    find_images(item, visited)
  File "procPDF.py", line 29, in find_images
    find_images(item, visited)
  [Previous line repeated 4 more times]
  File "procPDF.py", line 25, in find_images
    process_image(obj)
  File "procPDF.py", line 13, in process_image
    im = Image.open(image)
  File "/Users/eriknorden/Dokumente/Working-Space/230831 PDF Project/.venv/lib/python3.8/site-packages/PIL/Image.py", line 3222, in open
    fp.seek(0)
TypeError: 'NoneType' object is not callable

This is the code

import sys
import os
import zlib
from PIL import Image 
#from io import StringIO, BytesIO

from pdfrw import PdfReader, PdfDict, PdfArray, PdfName, PdfWriter

def process_image(image):
  if image["/Filter"] == PdfName("FlateDecode"):
    return
  elif image["/Filter"] == PdfName("DCTDecode"):
    im = Image.open(image)

def find_images(obj, visited=set()):
  if not isinstance(obj, (PdfDict, PdfArray)):
        return
  myId = id(obj)
  if myId in visited:
    return
  visited.add(myId)
  
  if isinstance(obj, PdfDict):
    if obj.Type == PdfName.XObject and obj.Subtype == PdfName.Image:
      process_image(obj)
    obj = obj.itervalues()
  
  for item in obj:
    find_images(item, visited)
   

if __name__ == '__main__':
    inpfn,outfn = sys.argv[1:]
    reader = PdfReader(inpfn)
    find_images(reader)

Original Q&A

TechQA.

How to get an image from pdfrw to be readable for Pillow

There are 0 answers

Related Questions in PYTHON-3.X

Related Questions in IMAGE

Related Questions in PDF

Related Questions in PYTHON-IMAGING-LIBRARY

Related Questions in PDFRW

Popular Questions

Trending Questions