I'm struggling to find a solution to keep using the Suhosin patch and make it work with UTF-8 form submissions. This is the very simple test I made:
<?php var_dump($_POST); ?>
<form method="post">
<input name="test" type="text"/>
<input type="submit" />
</form>
using the string iñtërnâtiônàlizætiøn. Obviously I enable the utf-8 headers on the server first and set the Php default_charset to utf-8 as well as I enabled the mb* override. As soon as I disable the Suhosin patch and re-submit the form, everything works as it should.
UPDATE
I did more tests just to be sure:
$test = $_POST['test'];
var_dump(mb_detect_encoding($test, "UTF-8", true));
// Returns true if $string is valid UTF-8 and false otherwise.
function is_utf8($string) {
// From http://w3.org/International/questions/qa-forms-utf-8.html
return preg_match('%^(?:
[\x09\x0A\x0D\x20-\x7E] # ASCII
| [\xC2-\xDF][\x80-\xBF] # non-overlong 2-byte
| \xE0[\xA0-\xBF][\x80-\xBF] # excluding overlongs
| [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2} # straight 3-byte
| \xED[\x80-\x9F][\x80-\xBF] # excluding surrogates
| \xF0[\x90-\xBF][\x80-\xBF]{2} # planes 1-3
| [\xF1-\xF3][\x80-\xBF]{3} # planes 4-15
| \xF4[\x80-\x8F][\x80-\xBF]{2} # plane 16
)*$%xs', $string);
} // function is_utf8
var_dump(is_utf8($test));
and both of the test returned false with the Suhosin patch enabled and true otherwise. The question is: is it a bug or is the expected behaviour? is there a configuration parameter for the Suhosin patch that does something magic about the multibyte strings?
The only option I see at this point is disable the patch unless a brilliant mind give the right advice.
UPDATE 2
the GET strings don't get corrupted and are displayed in the browser correctly. Only POST do at the moment.
From a Google search, I found http://algorytmy.pl/doc/php/ref.mbstring.php which mentions
This doesn't really mean much to me, but it does mention POST variables which seems to be the crux of the issue.
I found, if I set this in my Apache virtual host I could reproduce your problem:
For reference, this was the php test page I used to reproduce the issue:
I tried commenting out the following mbstring setting (or turning it off):
This seems to fix the issue, even though it doesn't make much sense to me because the internal character encoding is utf-8??
Another oddness I noticed was that if I set these
mbstring
values directly inphp.ini
(instead of the Apache virtual host), I was unable to reproduce the issue withencoding_translation
so it seems to be a problem only whenphp_admin_value
is used?