I need to parse a remote pdf file. With PyPDF2, it can be done by PdfReader(f), where f=urllib.request.urlopen("some-url").read() . f cannot be used by the PdfReader, and it seems that f has to be decoded. What argument should be used in decode(), or some other method has to be used.
parsing a remote pdf file with Python3 & PyPDF2
2.7k views Asked by Tom Liu At
2
There are 2 answers
2
celsowm
On
It is possible to decode using BytesIO:
import urllib, PyPDF2
from io import BytesIO
f = urllib.request.urlopen("https://mypdf.pdf").read()
pdf_bytes = BytesIO(f)
pdf_reader = PyPDF2.PdfFileReader(pdf_bytes)
Related Questions in PYTHON-3.X
- SQLAlchemy 2 Can't add additional column when specifying __table__
- Writes to child subprocess.Popen.stdin don't work from within process group?
- Platform Generation for a Sky Hop clone
- What's the best way to breakup a large test in pytest
- chess endgame engine in Python doesn't work perfectly
- Function to create matrix of zeros and ones, with a certain density of ones
- how to create a polars dataframe giving the colum-names from a list
- Django socketio process
- How to decode audio stream using tornado websocket?
- Getting website metadata (Excel VBA/Python)
- How to get text and other elements to display over the Video in Tkinter?
- Tkinter App - My Toplevel window is not appearing. App is stuck in mainloop
- Can I use local resources for mp4 playback?
- How to pass the value of a function of one class to a function of another with the @property decorator
- Python ModuleNotFoundError for command line tools built with setup.py
Related Questions in PDF
- How to use custom font during html to pdf conversion?
- How to get content of BLOCK types LAYOUT_TITLE, LAYOUT_SECTION_HEADER and LAYOUT_xx in Textract
- PDF form checkbox/radio button ignores content stream
- Suggest python library for rendering html to pdf files
- Problems with the order in which PDF files are created
- Centering a map element on a generated PDF
- download all pdf files from website doesn't support wildcard
- How to enter external pdf into quarto book while keeping page layout+numbering
- How do I create a website that combines user input and standard text and converts it into a pdf?
- Excel VBA error 1004 on PDF export - not a path issue
- downloading pdf using requests not working
- Creating pdf on Firestore with Pdfplum: Template path "no such object"
- Export password protected PDF from QGIS
- XPS convert PDF with Ghostscript
- Download PDF in ASP.NET MVC application
Related Questions in DECODE
- I get an error when republishing the image I shot with ros2 run ffmpeg
- Best way to create a conditional SQL query? CASE, DECODE, or IF/THEN?
- How to convert n most significant bits in a hexadecimal byte string in Python 3
- abinitio cant decode encoded sha256 value
- Swift decoding error types inconsistency with `Bool` type
- Receiving play integrity token, trouble decrypting on compactjws step
- Challenge flow 3ds Secure 2 cRes decode
- How to properly decode the image from encoded image text from a live ANPR Camera stream?
- How can WSL Python correctly decode Unicode characters coming from the output of a PowerShell subprocess?
- PHP Get value from Json decode Response
- Base 64 : Illegal base64 character 3 Exception
- NextJS decode google auth id_token
- Decode h264 frame using android hardware accelerated decoder in gstreamer
- Why can't read the correct data in python with modbus?
- Why is my python code (extract below) not decoding base64?
Related Questions in PYPDF
- Merge two PDF files page by page. Python, PyPDF2, Alteryx
- Non-Deterministic behavior in PDF library when accessing Django model in between
- UnicodeEncodeError while extracting text from pdf using pypdf
- How do I add a hyperlink to the top of each page in a PDF in Python?
- Extracting field labels and details from IRS XFA/AcroForm using Python
- PYPDF how to set restriction during pdf encryption
- Why does copying text from this PDF give an N-1 Caesarean shift?
- Extracting replies to comments in a PDF file and sorting them
- How do i resolve pyPDF2 import error on pythonanywhere
- Keep selected pages from PDF
- A script that multiplies and attaches a PDF to one page in Python using the PyPDF3 library crashes when receiving blank pages
- pypdf: arrange pages of different pdfs in a single page as a grid
- flet with pypdf2 cannot find file
- Problem of pages being overwritten while using pytesseract and PyPDF2
- Can PyPDF extract text from a two-column PDF in the natural reading order: first down the left column, then down the right
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
You need to use:
Add these lines after above line:
and then read using PdfReader as:
Also, refer: Opening pdf urls with pyPdf