HTML Purifier against external resources

812 views Asked by At

I am pretty new to XSS and HTML Purifier (researched for a few days). Yet i have been a programming and web-dev guy for many years. (Yes i know shame that i didnt come across XSS. i thought of stuffs similar. But just didnt research it in depth.)

AFAIK, attackers can load their evil external JS in places like IMG's SRC, and in other valid tags' attributes as well. So i come upon an idea that, if i prohibit user's html to load resources outside my domain, (and purifier what i already have in my document/database,) can i say my site is free from XSS attacks?

Let me rephrase and structure my queries.

First, i am going to build a website, that allows users to input (directly or through upload) html codes. Quite typical.

I will use HTML Purifier to 'clear' the user codes.

The first question: (Q1) Even after using HTML Purifier, attackers can still load their evil scripts via valid html attributes. Is this true?

And (Q2) I suppose i cannot allow the <script> tag in the HTML Purifier setting, as any evil things can happen in the JS within the <script> tag. Is it true?

(Q3) Can HTML Purifier strip out all links, in anywhere of the text, that are not referring to the domains i trusted?

And finally, a theoretical issue (Q4) If the text has been HTML Purified, and no external links, can we say that it is absolutely free from XSS?

P.S. one more thing is that, i would like to allow certain (very limited) JS. Do you think it is ok to convert (my custom) tags like [ajax:getUserName] into real JS, in the final process?

Thanks very much!

1

There are 1 answers

7
Edward Z. Yang On

Let's assume for a moment that HTML Purifier has no security vulnerabilities (generally, it's a bad bet to assume software is not buggy, so beware.)

Q1: If you use HTML Purifier as described by the documentation (use it to purify HTML, put the result of HTML Purifier only in HTML contexts, configure your character encoding properly), then attackers should not be able to load their scripts. It is "safe" out of the box.

Q2: HTML Purifier will not allow you to allow the >script<; it will reject it as unknown.

Q3: Unfortunately, HTML Purifier only currently directly supports blacklisting strings in host names (using %URI.HostBlacklist) and only allow local links (%URI.DisableExternal). But you could define a URI filter for a more complicated policy.

Q4: The no external links restriction is not necessary, it should be free of XSS.

PS: That is OK, as long as you handle escaping user input that is included into the JS properly.