Character encoding problem - I suppose (UTF-8 - JS and Windows1250 - PHP)

50 views Asked by At

all my problems so far I resolved by serching this forum, but now I reched to a wall.. :) mayby the problem is I'm don't know what question to ask..

So my problem is: I have submiting form in witch user put tracking list number - and for some cases this number starts with "%000xxxx" character. Using JS, and AJAX I make post to PHP endpoint. So fare, everything is ok, in console.log(data) I'm geting url:

endppoint/trackingNumber=%000xxxx&foo=bar

The problem starts in php (it's my guess) In POST details, in request I have somthing like this:

trackingNumber: \u000xxx
foo: bar

and when I'm printing in PHP controler - I get:

 " 0xxx"

PHP - is old one, 5.3.3

Done:

iconv('UTF-8', 'ISO-8859-1',$data);

I'd like to be able to post via PHP full tracking number (with %000 instead of " 0") and understand that isue.

1

There are 1 answers

1
Sammitch On BEST ANSWER

Your root problem is that % is significant in URL encoding, with %00 decoding to a null/zero byte. So before you include data in a URL you should urlencode() it.

$trackingNumber = "%000xxx";
$foo = "bar";

$url = 'endppoint/?trackingNumber=' . urlencode($trackingNumber) . '&foo=' . urlencode($foo);

parse_str(parse_url($url)['query'], $parsed); // how it will be read

var_dump(
    $url,
    $parsed
);

Output:

string(43) "endppoint/?trackingNumber=%25000xxx&foo=bar"
array(2) {
  ["trackingNumber"]=>
  string(7) "%000xxx"
  ["foo"]=>
  string(3) "bar"
}

Additionally, though the encoding does not seem to be significant in this specific case, you need to be careful with your encoding choices. Windows cpXXXX encodings and ISO-8859-X encodings are not equivalent, and should not be interchanged. PHP can convert either type of encoding if necessary, eg:

iconv('UTF-8', 'cp1250', $data);
iconv('UTF-8', 'ISO-8859-2', $data); // cp1250's rough equivalent in 8859, illustrative only

Also 1250 itself is seldom used, so unless you're working on a legacy system in Eastern Europe it's probably not that. Maybe cp1252?

Lastly, just general advice, is that text encoding is metadata that should always be known, never guessed, and anything that claims to "detect" the encoding is also guessing.

See: