I need to parse a remote pdf file. With PyPDF2, it can be done by PdfReader(f)
, where f=urllib.request.urlopen("some-url").read() . f cannot be used by the PdfReader, and it seems that f has to be decoded. What argument should be used in decode(), or some other method has to be used.
parsing a remote pdf file with Python3 & PyPDF2
2.6k views Asked by Tom Liu At
2
There are 2 answers
2
On
It is possible to decode using BytesIO:
import urllib, PyPDF2
from io import BytesIO
f = urllib.request.urlopen("https://mypdf.pdf").read()
pdf_bytes = BytesIO(f)
pdf_reader = PyPDF2.PdfFileReader(pdf_bytes)
You need to use:
Add these lines after above line:
and then read using PdfReader as:
Also, refer: Opening pdf urls with pyPdf