how to safely write out user generated text in xhtml

89 views Asked by At

How can some user generated text be safely written out on a webpage?

Is there some complete list of characters that needs to be escaped?

The ",+,: -character should probably be escaped, but there are probably a more comprehensive lis of what needs to be done.

I am thinking about the possibility to do exploits that inserts javascript or other things that will redirect the page or mess things up. The younger generation has so much creativity.

4

There are 4 answers

0
Jesse Cohen On

This vulnerability is called an XSS attack. Different programming languages have functionality to do the escaping automatically for you, for example in php you can use the function called htmlspecialchars() to escape user text that will be rendered raw. Other languages have similar functionalities.

This gets more complicated if you want to allow users to use only a subset of html (i.e. if you have a forum where users are allowed to format their posts to a limited degree etc...), then you actually have to parse the text and decide what to allow and what not to allow. There are a variety of engines that will do this for you (e.g. markdown, which SO uses).

2
JB Nizet On

Escaping <, >, & and ' should be sufficient.

0
Marco On

Depending on your serverside language there are special methods for this.

0
Aasmund Eldhuset On

(Copying my own answer to a similar question - please alert me if this is considered bad practice.)

You might want to consult the OWASP Cheat Sheet on Cross Site Scripting Prevention. It boils down to:

  • Being aware of the locations where you should not put untrusted data at all
  • Being aware of the different ways in which data should be escaped in the different kinds of locations where you can put untrusted data
  • Using whitelisting (escaping everything except specified safe characters) instead of blacklisting (only escaping specified unsafe characters)

(Read the entire document, though, rather than relying on this summary...)