How can some user generated text be safely written out on a webpage?
Is there some complete list of characters that needs to be escaped?
The ",+,: -character should probably be escaped, but there are probably a more comprehensive lis of what needs to be done.
I am thinking about the possibility to do exploits that inserts javascript or other things that will redirect the page or mess things up. The younger generation has so much creativity.
This vulnerability is called an XSS attack. Different programming languages have functionality to do the escaping automatically for you, for example in php you can use the function called
htmlspecialchars()
to escape user text that will be rendered raw. Other languages have similar functionalities.This gets more complicated if you want to allow users to use only a subset of html (i.e. if you have a forum where users are allowed to format their posts to a limited degree etc...), then you actually have to parse the text and decide what to allow and what not to allow. There are a variety of engines that will do this for you (e.g. markdown, which SO uses).