Why is there no $encoding parameter in htmlspecialchars_decode()?

44 views Asked by At

If we check out the documentation of the htmlspecialchars() function in PHP, we see that it has an $encoding parameter to specify the encoding of the input string.

Now, conversely, I expect the opposite htmlspecialchars_decode() function to also have an $encoding parameter. However, this is NOT the case.

I want to know why exactly is this the case. There has to be some reason for not including an $encoding parameter in htmlspecialchars_decode().

Surprisingly, there is an $encoding parameter in html_entity_decode(), so what's the point of including it in that function.

1

There are 1 answers

8
deceze On

I'd have to guess here slightly, but… htmlspecialchars_decode only decodes a small handful of characters which are all ASCII characters. So there's no need to specify the target encoding you want to decode these characters to, as they're all the same in all ASCII-compatible encodings. Now what if you wanted to decode to a non-ASCII compatible encoding? That is probably virtually never the case, and you can simply do some encoding conversion before and/or afterwards if you really needed that.

PHP has always assumed ASCII for the things that matter to it and arbitrary bytes for anything else that don't matter to it, so this function has never received any unified encoding support, just as a lot of other functions haven't either.

The functions htmlspecialchars and html_entity_decode have received this treatment at some point, as the cases where the encoding does matter are probably encountered more often with them. In the case of html_entity_decode, it decodes a wider range of characters and it does matter what encoding you decode those to.

htmlspecialchars appears to need to know the encoding to properly preserve the string's contents. I don't really understand why, as it would just need to look for certain ASCII bytes to replace, but not passing the correct encoding will garble your non-ASCII text.