PHP 5.3, Suhosin and UTF-8

802 views Asked by At

I'm struggling to find a solution to keep using the Suhosin patch and make it work with UTF-8 form submissions. This is the very simple test I made:

<?php var_dump($_POST); ?>
<form method="post">
    <input name="test" type="text"/>
    <input type="submit" />
</form>

using the string iñtërnâtiônàlizætiøn. Obviously I enable the utf-8 headers on the server first and set the Php default_charset to utf-8 as well as I enabled the mb* override. As soon as I disable the Suhosin patch and re-submit the form, everything works as it should.

UPDATE

I did more tests just to be sure:

$test = $_POST['test'];

var_dump(mb_detect_encoding($test, "UTF-8", true));

// Returns true if $string is valid UTF-8 and false otherwise.
function is_utf8($string) {

    // From http://w3.org/International/questions/qa-forms-utf-8.html
    return preg_match('%^(?:
      [\x09\x0A\x0D\x20-\x7E]            # ASCII
    | [\xC2-\xDF][\x80-\xBF]             # non-overlong 2-byte
    |  \xE0[\xA0-\xBF][\x80-\xBF]        # excluding overlongs
    | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}  # straight 3-byte
    |  \xED[\x80-\x9F][\x80-\xBF]        # excluding surrogates
    |  \xF0[\x90-\xBF][\x80-\xBF]{2}     # planes 1-3
    | [\xF1-\xF3][\x80-\xBF]{3}          # planes 4-15
    |  \xF4[\x80-\x8F][\x80-\xBF]{2}     # plane 16
    )*$%xs', $string);

} // function is_utf8
var_dump(is_utf8($test));

and both of the test returned false with the Suhosin patch enabled and true otherwise. The question is: is it a bug or is the expected behaviour? is there a configuration parameter for the Suhosin patch that does something magic about the multibyte strings?

The only option I see at this point is disable the patch unless a brilliant mind give the right advice.

UPDATE 2

the GET strings don't get corrupted and are displayed in the browser correctly. Only POST do at the moment.

3

There are 3 answers

1
Tom On BEST ANSWER

From a Google search, I found http://algorytmy.pl/doc/php/ref.mbstring.php which mentions

Beginning with PHP 4.3.3, if enctype for HTML form is set to multipart/form-data and mbstring.encoding_translation is set to On in php.ini the POST'ed variables and the names of uploaded files will be converted to the internal character encoding as well. However, the conversion isn't applied to the query keys.

This doesn't really mean much to me, but it does mention POST variables which seems to be the crux of the issue.

I found, if I set this in my Apache virtual host I could reproduce your problem:

php_admin_value mbstring.language       "Neutral"
php_admin_value mbstring.encoding_translation   "On"
php_admin_value mbstring.http_input     "UTF-8"
php_admin_value mbstring.http_output    "UTF-8"
php_admin_value mbstring.detect_order   "auto"
php_admin_value mbstring.substitute_character   "none"
php_admin_value mbstring.internal_encoding "UTF-8"
php_admin_value mbstring.func_overload "7"
php_admin_value default_charset "UTF-8"

For reference, this was the php test page I used to reproduce the issue:

<!DOCTYPE html>
<html>
<head>
</head>
<body>
<pre><?php echo $_POST['test'];?></pre>
<form method="post">
    <input name="test" type="text"/>
    <input type="submit" />
</form>
Test string to use: iñtërnâtiônàlizætiøn
</body>
</html>

I tried commenting out the following mbstring setting (or turning it off):

; Disable HTTP Input conversion (PHP 4.3.0 or higher)
mbstring.encoding_translation = Off

This seems to fix the issue, even though it doesn't make much sense to me because the internal character encoding is utf-8??

Another oddness I noticed was that if I set these mbstring values directly in php.ini (instead of the Apache virtual host), I was unable to reproduce the issue with encoding_translation so it seems to be a problem only when php_admin_value is used?

1
Tobias On

Have you tryed?

<form accept-charset="UTF-8" method="post">

-> http://www.razorvine.net/test/utf8form/utf8pageform.html

1
Roshan Wijesena On

Did You try in Your meta tags on HTML page following

<meta http-equiv="Content-Type" content="text/html;charset=utf-8" ></meta>