Given the code below, how do I use htmlpurifier to allow the entire contents to pass through. I want to allow the entire html document but the html,head,style,title,body and meta get stripped out.
I even tried $config->set('Core.ConvertDocumentToFragment', false)
but that didn't work.
Any help on where to start would be greatly appreciated.
I tried the example here HTML Purifier - Change default allowed HTML tags configuration but it doesn't work. I keep getting exceptions that the tags are not allowed. NOTE: I did add all the tags above in HTML.Allowed but nothing seems to work.
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1, minimum-scale=1, maximum-scale=1" />
<title>Hello World - Email Template</title>
<style type="text/css">
@import url(https://fonts.googleapis.com/css?family=Open+Sans:400,600);
body{-webkit-text-size-adjust: none;-ms-text-size-adjust: none;margin: 0;padding: 0;}
</style>
<body>
<h1>Hi there</h1>
</body>
</html>
HTML Purifier by default only knows tags that are valid within a
<body>
context, because that's its intended use-case. Basically, it doesn't actually know what a<meta>
,<html>
,<head>
or<title>
tag is - and that's a big deal, because most of its security relies on understanding the semantic underpinnings of the HTML!There are some older stackoverflow questions on this topic:
...but they don't currently have very useful answers, so after some contemplation, I think your question still has merit and am going to answer here.
Generally, this has been discussed a few times on the HTML Purifier forums (e.g. in Allow HTML, HEAD, STYLE and BODY tags) - but the nutshell is that you can't do this without a significant amount of work, and unfortunately I'm not currently familiar with any snippet of code that solves this problem with a simple copy and paste.
So you're going to have to dig into the guts of HTML Purifier.
You can teach HTML Purifier most tags and associated behaviour using the instructions on the Customize! documentation page. The part most interesting for you would be near the bottom, an example where
<form>
is taught to HTML Purifier. Quoting from there for some posterity:You would have to do similar things with all tags outside of the
<body>
tag that you want to support (all the way up to<html>
).Note: Even if you add all these tags to HTML Purifier, the setting
Core.ConvertDocumentToFragment
that you discovered needs to be set tofalse
(as you have done).Alternative
If this looks like too much work, and you have other ways to sanitise the header section and body attributes of your document, you can also cut your document into pieces, sanitise the pieces separately, then carefully stick them back together.
(Or, of course, just use the alternative for the entire document.)