Let's say I get some html successfully using the following:
from requests_html import HTMLSession
session = HTMLSession()
url = https://example.com
html = session.get(url).html
Now I want to modify that html and then save it to a local file. How would I do that?
I want to update the href
attribute, but this doesn't do it:
for a in html.find["a"]:
link = a.attr['href'] # https://example.com/page1.html
a.attr['href'] = "page1.html"
with open("index.html", "wb") as f:
f.write(html.raw_html)
Is there a way to do this with requests-html, or do I have to use lxml, bs4, or qyquery to edit the html?
It's better to manually reconstruct the HTML from the modified elements.
In the requests_html library, when you use the
.find()
method, it returns a list of Element objects. These Element objects represent the HTML elements as they were at the time of parsing. Modifying these Element objects does not directly change the underlying HTML source text.