PHP4: Json_encode method which accepts multi byte chars

428 views Asked by At

in my company we have a webservice zu send data from very old projects to pretty new ones. The old projects run PHP4.4 which has natively no json_encode method. So we used the PEAR class Service_JSON instead. http://www.abeautifulsite.net/using-json-encode-and-json-decode-in-php4/

Today, I found out, that this class can not deal with multi byte chars because it extensively uses ord() in order to get charcodes from the string and replace the chars. There is no mb_ord() implementation, not even in newer PHP versions. It also uses $string{$index} to access the char at a index, I'm not completely sure if this supports multi byte chars.

//Excerpt from encode() method

// STRINGS ARE EXPECTED TO BE IN ASCII OR UTF-8 FORMAT
            $ascii = '';
            $strlen_var = $this->strlen8($var);

           /*
            * Iterate over every character in the string,
            * escaping with a slash or encoding to UTF-8 where necessary
            */
            for ($c = 0; $c < $strlen_var; ++$c) {

                $ord_var_c = ord($var{$c});
                //Here comes a switch which replaces chars according o their hex code   and writes them to $ascii

we call

$Service_Json = new Service_JSON();
$data = $Service_Json->encode('Marktplatz, Hauptstraße, Endingen');
echo $data; //prints "Marktplatz, Hauptstra\u00dfe, Endinge". The n is missing

We solved this problem by setting up another webservice which receives serialised arrays and returns a json_encoded string. This service runs on a modern mahine, so it uses PHP5.4. But this "solutions is pretty awkward and I should look for a better one. Does anyone have an idea?

Problem description

German umlauts are replaced properly. BUT then the string is cut of at the end because ord returns the wrong chars. . mb_strlen() does not change anything, it gives the same length as strlen in this case.

Input string was "Marktplatz, Hauptstraße, Endingen", the n at the end was cut off. The ß was correctly encoded to \u00df. For every Umlaut it cuts of one more char at the end.

It's also possible the reason is our old database encoding, but the replacement itself works correctly so I guess it's the ord() method.

1

There are 1 answers

0
Powerriegel On BEST ANSWER

A colleague found out that

mb_strlen($var, 'ASCII');

solves the problem. We had an older lib version in use which used simple mb_strlen. This fix seems to do the same as your mb_convert_encoding();

Problem is solved now. Thank you very much for your help!