encodeURIComponent appears to add a character to my string

1.3k views Asked by At

jQuery.ajax() is doing something weird when escaping my data.

For example, if I send the request:

$.ajax({
    url: 'somethinguninteresting',
    data: {
        name: 'Ihave¬aweirdcharacter';
    }
});

then investigate the XHR in Chrome devtools, it shows the "Request Payload" as name=Ihave%C2%ACaweirdcharacter

Now, I've figured out that:

'¬'.charCodeAt(0) === 172

and that 172 is AC in hexadecimal.

Working backwards, C2 (the "extra" character being prepended) in hexadecimal is 194 in decimal, and

String.fromCharCode(194) === 'Â'

My Question:

Why does

encodeURIComponent('¬')

return '%C2%AC', which would appear to be the result of calling

encodeURIComponent('¬')

(which itself returns '%C3%82%C2%AC')?

2

There are 2 answers

3
Ja͢ck On BEST ANSWER

Although JavaScript uses UTF-16 (or UCS-2) internally, it performs URI encoding based on UTF-8.

The ordinal value of 172 is encoded in two bytes, because it can no longer be represented by ASCII; two-byte encoding in UTF-8 is done this way:

110xxxxx 10xxxxxx

In the place of x we fill in the binary representation of 172, which is 10101100:

11000010 10101100 = C2AC
   ^^^
   pad

This outcome is then percent encoded to finally form %C2%AC which is what you saw in the request payload.

6
Justin Howard On

Url encoding (or percent encoding), encodes unicode characters using UTF-8. UTF-8 encodes characters with varying numbers of bytes. The ¬ character is encoded in UTF-8 as C2 AC.

The charCodeAt method does not handle multi-byte sequences. See this answer https://stackoverflow.com/a/18729931/4231110 for more details on how to use charCodeAt to encode a string with UTF-8.

In short, %C2%AC is the correct percent encoding of ¬. This can be demonstrated by running

decodeURIComponent('%C2%AC') // '¬'