parsing a remote pdf file with Python3 & PyPDF2

Question

parsing a remote pdf file with Python3 & PyPDF2

2.7k views Asked by Tom Liu At 24 June 2015 at 16:16

I need to parse a remote pdf file. With PyPDF2, it can be done by PdfReader(f), where f=urllib.request.urlopen("some-url").read() . f cannot be used by the PdfReader, and it seems that f has to be decoded. What argument should be used in decode(), or some other method has to be used.

Original Q&A

There are 2 answers

**Nitin Bhojwani** · Answer 1 · 2015-12-08T14:48:39+00:00

Nitin Bhojwani On 08 December 2015 at 14:48

You need to use:

f = urllib.request.urlopen("some-url").read()

Add these lines after above line:

from StringIO import StringIO

f = StringIO(f)

and then read using PdfReader as:

reader = PdfReader(f)

Also, refer: Opening pdf urls with pyPdf

**celsowm** · Answer 2 · 2024-03-29T19:41:53+00:00

celsowm On 29 March 2024 at 19:41

It is possible to decode using BytesIO:

import urllib, PyPDF2
from io import BytesIO
f = urllib.request.urlopen("https://mypdf.pdf").read()
pdf_bytes = BytesIO(f)
pdf_reader = PyPDF2.PdfFileReader(pdf_bytes)

TechQA.

parsing a remote pdf file with Python3 & PyPDF2

There are 2 answers

Related Questions in PYTHON-3.X

Related Questions in PDF

Related Questions in DECODE

Related Questions in PYPDF

Popular Questions

Trending Questions