Please excuse me, I really need to know how does the incorporated Unicode version (5) works in ECMAScript 4. I just need to know how it's encoded or decoded, or which encoding does ECMAScript 4 use. I'm saying about the encoding used for char codes (or code points, I think) of strings.
Advice: "ultrapasses" here means "bigger than", or further, for example. I thought it was valid in English.
I basically thought it was UTF-16, but for my tests it ultrapasses U+10FFFF. The maximum character code I got using ECMAScript 4, without exceptions, was U+FFFFFF, except that when I'm using String.fromCharCode()
to encode this character code, it results in U+1FFFFF (\u{...}
generates up to 0xFFFFFF
different characters, but String.fromCharCode()
generates up to 0x1FFFFF
different characters). In ECMAScript 6 code points, the max I can get is U+10FFFF, a small difference, and since it uses UCS-2 (at least in my browser, Chrome), ECMAScript 6 generates more code units (a code unit = 2 bytes), and I guess ECMAScript 6 has a small fail when encoding code points using UCS-2 (though that's not bug, just a small fail), just check my question if you want to know.
0xFFFFFF
is the max char code (or code point...?). Why do I think it's a char code in ECMAScript 4? Maybe because there's no String#codePointAt
and String#fromCodePoint
like in ECMAScript 6, and it really gets out of UCS-2. First let me show you some tests using ECMAScript 4:
(Yes, ECMAScript 4 never existed, but draft, including an unfinished virtual machine for evaluating ECMAScript 4. http://ecmascript.org is down, but still on http://archive.org, so I've made a little copy in a 7Zip file)
// Decimal: 16777215
const ch = 0xffffff;
const chString = '\u{ffffff}';
// Ultrapasses the maximum char code (or code point), then
// an exception got thrown, well.
'\u{1000000}';
// Ultrapasses it too, but returns '\u{ charCode % 1000000 }' anyways.
String.fromCharCode(ch + 1);
// Correct.
chString.charCodeAt(0); // Code: 16777215
// I didn't expect this!!! \/
String.fromCharCode(ch); // Gives me '\u{1fffff}' back.
// An Unicode char code (which is code point, I think) is always
// equivalent to one character in the string.
chString.length; // 1
String.fromCharCode(ch).length; // 1
The ECMAScript 4 overview doesn't talk further about that, it only mentions it does incorporate Unicode 5, but not the encoding. Which encoding is incorporated in this case? It'd also be nice to know why String.fromCharCode(charCode)
is different from \u{...}
Unicode code escape by the above examples.