from lxml import html
root = html.parse("hello_world.html").getroot()
print(html.tostring(root))
# <html><body><p>Hello, World!</p></body></html>
p = root.find("body/p")
p.drop_tree()
print(html.tostring(root))
# <html><body></body></html>
On a related note, if you want to look into some more advanced parsing with lxml, one of my oldest questions on here was around getting python to parse xml & write python code out of it. Writing a Python tool to convert XML to Python?
You should use
lxml
. Bleach is simply for cleaning data & ensuring security/safety in the markup you store.You can use
lxml
to parse structured data like HTML or XML.Consider a simple html file;
On a related note, if you want to look into some more advanced parsing with
lxml
, one of my oldest questions on here was around getting python to parse xml & write python code out of it. Writing a Python tool to convert XML to Python?