I'm trying to understand XSS attacks. I learnt that I should use htmlspecialchars() whenever outputting something to the browser that came from the user input. The code below works fine.
What I don't understand is whether there is a need to use htmlspecialchars() here for echoing the $enrollmentno or not?
<?php
$enrollmentno = (int)$_POST['enrollmentno'];
echo "<div style='border-radius:45px; border-width: 2px; border-style: dashed; border-color: black;'><center><h4><b>$enrollmentno</b></h4></center></div>";
$clink = "http://xyz/$enrollmentno/2013";
echo"<iframe src='$clink' width='1500' height='900' frameBorder='0'></iframe>";
?>
If I do something like
$safe = "<div style='border-radius:45px; border-width: 2px; border-style: dashed; border-color: black;'><center><h4><b>$enrollmentno</b></h4></center></div>";
echo htmlspecialchars($safe, ENT_QUOTES);
It doesn't show the correct HTML format.
I'm not sure if I have to use HTMLPurifer here. Does HTMLPurifer retain the HTML formating while prevent XSS?
Update
echo "<div style='border-radius:45px; border-width: 2px; border-style: dashed; border-color: black;'><center><h4><b>".htmlspecialchars ($enrollmentno)."</b></h4></center></div>";
Does the trick!
Any time you use arbitrary data in the context of HTML, you should be using
htmlspecialchars()
. The reason for this is that it prevents your text content from being treated as HTML, which could potentially be malicious if coming from outside users. It also ensures you are generating valid HTML that browsers can handle consistently.Suppose I want the text "8 > 3" to appear on in HTML. To do this, my HTML code would be
8 > 3
. The>
is encoded as>
so that it isn't misinterpreted as part of a tag.Now, suppose I am making a web page about how to write HTML. I want the user to see the following:
If I don't want
<p>
and</p>
to be interpreted as an actual paragraph, but as text, you need to encode:htmlspecialchars()
does that. It allows you to insert arbitrary text into an HTML context in a safe way.Now, in your second example:
This does exactly what you asked it to do. You gave it some text, and it encoded that. If you wanted it as HTML, you should have just echoed it.
Now, if you need to display HTML as HTML and it comes from an untrusted source (i.e. not you), then you need tools like HTMLPurifier. You do not need this if you trust the source. Running all your output through
htmlspecialchars()
doesn't magically make things safe. You only need it when inserting arbitrary text data. Here's a good use case:In this case, both the username and review text can contain whatever that user typed in, and they will be encoded correctly for use in HTML.