I'm trying to use bleach to escape HTML tags. It works just fine, unless I'm trying to insert a code snipped as a content of a page. The snippet is inserted like this:
<pre>
<code>
Code sample
</code>
</pre>
The code sample may contain html tags. How can I make bleach not to escape tags if they are inside <pre><code>
? I know I can whitelist some tags, but it seems that there is not way to whitelist all tags if they are inside the code block and blacklist then in other cases. The outer html markup is produced from Markdown.
Moreover, bleach escapes all < and > signs, but if they occur in the code snippet, it looks like this:
for (auto a = 0; i < 10; ++i)
If bleach is not capable of this, could you advice another escaper, that can do what I need?
You want to whitelist children tags of < pre > and < code >. From what I can infer from reading on the documentation, you have to define one by one the tags you want to whitelist or use a callable that every time a tag gets encountered the callable will be invoked.
Check on the documentation the section named: Callable Filters
A possible solution for your problem is to pass a function on the clean bleach.clean that will check whether the tag encountered by then clean method is a child of the code html tag. You will have to parse the HTML there, you can use HTML parser for that along with TreeBuilder of
xml.eTree
packageHere is an example on a different answer.