Bleach strips non-whitelisted tags from HTML, but leaves child nodes, e.g.
>>> import bleach
>>> bleach.clean("<a href="">stays</a>", strip=True, tags=[])
'stays'
>>>
How can the entire element along with its children be removed?
Bleach strips non-whitelisted tags from HTML, but leaves child nodes, e.g.
>>> import bleach
>>> bleach.clean("<a href="">stays</a>", strip=True, tags=[])
'stays'
>>>
How can the entire element along with its children be removed?
You should use
lxml
. Bleach is simply for cleaning data & ensuring security/safety in the markup you store.You can use
lxml
to parse structured data like HTML or XML.Consider a simple html file;
On a related note, if you want to look into some more advanced parsing with
lxml
, one of my oldest questions on here was around getting python to parse xml & write python code out of it. Writing a Python tool to convert XML to Python?