We have a simple script that reads incoming PDF files. If landscape it rotates it to Portrait for later consumption by another program. All was running well with pyPdf until I ran into a file with an IndirectObject as the value for the /Rotate key on the page. The Object is resolvable so I can tell what the /Rotate value is but when attempting to rotateClockwise or rotateCounterClockwise I get a traceback because pyPdf isn't expecting an IndirectObject in /Rotate. I've done quite a bit of playing around with the file trying to override the IndirectObject with the value but I haven't gotten anywhere. I even tried passing the same IndirectObject to rotateClockwise and it throws the same traceback, a line earlier in pdf.pyc
My question put simply is . . . is there a patch for pyPdf or PyPDF2 that makes it not choke on this kind of setup, or a different way I can go about rotating the page, or a different library that I haven't seen / considered yet? I've tried PyPDF2 and it has the same issue. I have looked at PDFMiner as a replacement but it seems to be more geared toward getting info out of PDF files rather than manipulating them. Here's the output from me playing with the file with pyPDF in ipython, the output for PyPDF2 was very similar but some of the formatting of the info was slightly different:
In [1]: from pyPdf import PdfFileReader
In [2]: mypdf = PdfFileReader(open("RP121613.pdf","rb"))
In [3]: mypdf.getNumPages()
Out[3]: 1
In [4]: mypdf.resolvedObjects
Out[4]:
{0: {1: {'/Pages': IndirectObject(2, 0), '/Type': '/Catalog'},
2: {'/Count': 1, '/Kids': [IndirectObject(4, 0)], '/Type': '/Pages'},
4: {'/Count': 1,
'/Kids': [IndirectObject(5, 0)],
'/Parent': IndirectObject(2, 0),
'/Type': '/Pages'},
5: {'/Contents': IndirectObject(6, 0),
'/MediaBox': [0, 0, 612, 792],
'/Parent': IndirectObject(4, 0),
'/Resources': IndirectObject(7, 0),
'/Rotate': IndirectObject(8, 0),
'/Type': '/Page'}}}
In [5]: mypage = mypdf.getPage(0)
In [6]: myrotation = mypage.get("/Rotate")
In [7]: myrotation
Out[7]: IndirectObject(8, 0)
In [8]: mypdf.getObject(myrotation)
Out[8]: 0
In [9]: mypage.rotateCounterClockwise(90)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/root/<ipython console> in <module>()
/usr/lib/python2.7/site-packages/pyPdf/pdf.pyc in rotateCounterClockwise(self, angle)
1049 def rotateCounterClockwise(self, angle):
1050 assert angle % 90 == 0
-> 1051 self._rotate(-angle)
1052 return self
1053
/usr/lib/python2.7/site-packages/pyPdf/pdf.pyc in _rotate(self, angle)
1054 def _rotate(self, angle):
1055 currentAngle = self.get("/Rotate", 0)
-> 1056 self[NameObject("/Rotate")] = NumberObject(currentAngle + angle)
1057
1058 def _mergeResources(res1, res2, resource):
TypeError: unsupported operand type(s) for +: 'IndirectObject' and 'int'
In [10]: mypage.rotateClockwise(90)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/root/<ipython console> in <module>()
/usr/lib/python2.7/site-packages/pyPdf/pdf.pyc in rotateClockwise(self, angle)
1039 def rotateClockwise(self, angle):
1040 assert angle % 90 == 0
-> 1041 self._rotate(angle)
1042 return self
1043
/usr/lib/python2.7/site-packages/pyPdf/pdf.pyc in _rotate(self, angle)
1054 def _rotate(self, angle):
1055 currentAngle = self.get("/Rotate", 0)
-> 1056 self[NameObject("/Rotate")] = NumberObject(currentAngle + angle)
1057
1058 def _mergeResources(res1, res2, resource):
TypeError: unsupported operand type(s) for +: 'IndirectObject' and 'int'
In [11]: mypage.rotateCounterClockwise(myrotation)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/root/<ipython console> in <module>()
/usr/lib/python2.7/site-packages/pyPdf/pdf.pyc in rotateCounterClockwise(self, angle)
1048 # @param angle Angle to rotate the page. Must be an increment of 90 deg.
1049 def rotateCounterClockwise(self, angle):
-> 1050 assert angle % 90 == 0
1051 self._rotate(-angle)
1052 return self
TypeError: unsupported operand type(s) for %: 'IndirectObject' and 'int'
I'll gladly supply the file I'm working with if someone wants to take an in-depth look at it.
You need to apply getObject to an instance of IndirectObject, so in your case it should be