pyPdf IndirectObject in /Rotate

3.1k views Asked by At

We have a simple script that reads incoming PDF files. If landscape it rotates it to Portrait for later consumption by another program. All was running well with pyPdf until I ran into a file with an IndirectObject as the value for the /Rotate key on the page. The Object is resolvable so I can tell what the /Rotate value is but when attempting to rotateClockwise or rotateCounterClockwise I get a traceback because pyPdf isn't expecting an IndirectObject in /Rotate. I've done quite a bit of playing around with the file trying to override the IndirectObject with the value but I haven't gotten anywhere. I even tried passing the same IndirectObject to rotateClockwise and it throws the same traceback, a line earlier in pdf.pyc

My question put simply is . . . is there a patch for pyPdf or PyPDF2 that makes it not choke on this kind of setup, or a different way I can go about rotating the page, or a different library that I haven't seen / considered yet? I've tried PyPDF2 and it has the same issue. I have looked at PDFMiner as a replacement but it seems to be more geared toward getting info out of PDF files rather than manipulating them. Here's the output from me playing with the file with pyPDF in ipython, the output for PyPDF2 was very similar but some of the formatting of the info was slightly different:

In [1]: from pyPdf import PdfFileReader

In [2]: mypdf = PdfFileReader(open("RP121613.pdf","rb"))

In [3]: mypdf.getNumPages()
Out[3]: 1

In [4]: mypdf.resolvedObjects
Out[4]: 
{0: {1: {'/Pages': IndirectObject(2, 0), '/Type': '/Catalog'},
     2: {'/Count': 1, '/Kids': [IndirectObject(4, 0)], '/Type': '/Pages'},
     4: {'/Count': 1,
     '/Kids': [IndirectObject(5, 0)],
     '/Parent': IndirectObject(2, 0),
     '/Type': '/Pages'},
     5: {'/Contents': IndirectObject(6, 0),
     '/MediaBox': [0, 0, 612, 792],
     '/Parent': IndirectObject(4, 0),
     '/Resources': IndirectObject(7, 0),
     '/Rotate': IndirectObject(8, 0),
     '/Type': '/Page'}}}

In [5]: mypage = mypdf.getPage(0)

In [6]: myrotation = mypage.get("/Rotate")

In [7]: myrotation
Out[7]: IndirectObject(8, 0)

In [8]: mypdf.getObject(myrotation)
Out[8]: 0

In [9]: mypage.rotateCounterClockwise(90)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

/root/<ipython console> in <module>()

/usr/lib/python2.7/site-packages/pyPdf/pdf.pyc in rotateCounterClockwise(self, angle)
   1049     def rotateCounterClockwise(self, angle):
   1050         assert angle % 90 == 0
-> 1051         self._rotate(-angle)
   1052         return self
   1053 

/usr/lib/python2.7/site-packages/pyPdf/pdf.pyc in _rotate(self, angle)
   1054     def _rotate(self, angle):
   1055         currentAngle = self.get("/Rotate", 0)
-> 1056         self[NameObject("/Rotate")] = NumberObject(currentAngle + angle)
   1057 
   1058     def _mergeResources(res1, res2, resource):

TypeError: unsupported operand type(s) for +: 'IndirectObject' and 'int'

In [10]: mypage.rotateClockwise(90)       
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

/root/<ipython console> in <module>()

/usr/lib/python2.7/site-packages/pyPdf/pdf.pyc in rotateClockwise(self, angle)
   1039     def rotateClockwise(self, angle):
   1040         assert angle % 90 == 0
-> 1041         self._rotate(angle)
   1042         return self
   1043 

/usr/lib/python2.7/site-packages/pyPdf/pdf.pyc in _rotate(self, angle)
   1054     def _rotate(self, angle):
   1055         currentAngle = self.get("/Rotate", 0)
-> 1056         self[NameObject("/Rotate")] = NumberObject(currentAngle + angle)
   1057 
   1058     def _mergeResources(res1, res2, resource):

TypeError: unsupported operand type(s) for +: 'IndirectObject' and 'int'

In [11]: mypage.rotateCounterClockwise(myrotation)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

/root/<ipython console> in <module>()

/usr/lib/python2.7/site-packages/pyPdf/pdf.pyc in rotateCounterClockwise(self, angle)
   1048     # @param angle Angle to rotate the page.  Must be an increment of 90 deg.

   1049     def rotateCounterClockwise(self, angle):
-> 1050         assert angle % 90 == 0
   1051         self._rotate(-angle)
   1052         return self

TypeError: unsupported operand type(s) for %: 'IndirectObject' and 'int'

I'll gladly supply the file I'm working with if someone wants to take an in-depth look at it.

2

There are 2 answers

1
arainchi On

You need to apply getObject to an instance of IndirectObject, so in your case it should be

myrotation.getObject()
0
lusteri On

I realize this is an old issue, but I found this post in my search in trying to resolve sooner than I found my solution. From what I understand it was a bug: https://github.com/py-pdf/PyPDF2/pull/338/files

In summary, I edited the PyPDF2 source directly to implement the fix. Locate PyPDF2/pdf.py and search for the def _rotate(self,angle):line. Replace with the following:

def _rotate(self, angle):
    rotateObj = self.get("/Rotate", 0)
    currentAngle = rotateObj if isinstance(rotateObj, int) else rotateObj.getObject()
    self[NameObject("/Rotate")] = NumberObject(currentAngle + angle)

It now works like a charm.