How can I get the content of an url and write into new file using HTMLSession in Python?

155 views Asked by At

In beautifulsoup, we use response.content to render the text of the URL and create new file. What should we write if we use HTMLSession from requests_html instead of beautifulsoup?

For example,

import requests
from urllib.parse import urlparse
from requests_html import HTMLSession

session = HTMLSession()

# Specify the DOI here
URL="https://academic.oup.com/qje/article/126/4/1593/17089543/qjr041.pdf" 
r = session.get(URL,allow_redirects=True)
with open(pdf_title, "wb") as new_pdf:
    print(f"Begin writing to {pdf_title}")
    new_pdf.write(r.html.content) # This line is not working
1

There are 1 answers

0
Tim Roberts On BEST ANSWER

This is all you need, although when I do this, I get "request forbidden by administrative rules". Presumably, you have the key to get past this.

import requests

pdf_title = "xyz.pdf"
URL="https://academic.oup.com/qje/article/126/4/1593/17089543/qjr041.pdf" 
r = requests.get(URL,allow_redirects=True)
with open(pdf_title, "wb") as new_pdf:
    new_pdf.write(r.content)