go
var int32s = []int32{
8, 253, 80, 56, 30, 220, 217, 42, 235, 33, 211, 23, 231, 216, 234, 26,
}
fmt.Println("word: ", string(int32s))
js
let int32s = [8, 253, 80, 56, 30, 220, 217, 42, 235, 33, 211, 23, 231, 216, 234, 26]
str = String.fromCharCode.apply(null, int32s);
console.log("word: " + String.fromCharCode.apply(null, int32s))
2 results above are not the same for some empty characters.
Is there any solution for modify go code to generate same result to the js one?
To cite the docs on
String.fromCharCode:So each number in your
int32sarray is interpreted as a 16-bit integer providing a Unicode code unit, so that the whole sequence is interpreted as a series of code units forming an UTF-16-encoded string.I'd stress the last point because judging from the naming of the variable—
int32s,—whoever is the author of the JS code, they appear to have incorrect idea about what is happening there.Now back to the Go counterpart. Go does not have built-in support for UTF-16 encodings; its strings are normally encoded using UTF-8 (though they are not required to, but let's not digress), and also Go provides the
runedata type which is an alias toint32. A rune is a Unicode code point, that is, a number which is able to contain a complete Unicode character. (I'll get back to this fact and its relation to the JS code in a moment.)Now, what's wrong with your
string(int32s)is that it interpets your slice ofint32s in the same way as[]rune(remember that aruneis an alias toint32), so it takes each number in the slice to represent a single Unicode character and produces a string of them. (This string is internally encoded as UTF-8 but this fact is not really relevant to the problem.)In other words, the difference is this:
The Go standard library produces a package to deal with UTF-16 encoding:
encoding/utf16, and we can use it to do what the JS code codes—to decode an UTF-16-encoded string into a sequence of Unicode code points, which we can then convert to a Go string:Playground.
(Note that I've change the type of the slice to
[]unit16and renamed it accordingly. Also, I've decoded the source slice to an explicitly named variable; this is done for clarity—to highlight what's happening.)This code produces the same gibberish as the JS code does in the Firefox console.
Update on the
bit which I did not touch.
The problem, as I understand it, is that your Go code prints something like
ýP8ÜÙ*ë!ÓçØêwhile the JS code prints
�ýP8�ÜÙ*ë!Ó�çØê�right?
The problem here is in the different interpretation of the resulting string
fmt.Printlnandconsole.logdo.Let me first state that your Go code happens to work correctly without using proper decoding as I've suggested—because all the integers in the slice are UTF-16 code units in the "basic" range, so "dumb" conversion works, and produces the same string as the JS code does.
To see the both strings "as is" you could do this:
For Go, use
fmt.Printfwith the%qverb to see "special" Unicode (and ASCII) characters "escaped" using the Go rules in the printout:fmt.Println("%q\n", string(int32s))produces
"\býP8\x1eÜÙ*ë!Ó\x17çØê\x1a"Notice these '\b', '\x1e' and other escapes:
As you can see, these are control characters, which are not printable.
For JS, print the value of the resulting string without using
console.log—just save its value in a variable then enter its name at the console and hit Enter—to have its value printed "as is":Note that the string contains the "\uXXXX" escapes. They define Unicode code points (BTW Go supports the same syntax), and these escapes define the same code points as can be seen in the Go example:
As you can see, the strings produced are the same, with the only difference is that Go's string is encoded in UTF-8, and because of this, peering into its contents using
fmt.Printfand%qlooks at the encoded bytes, and that's why Go prints their "escapes" using "minimal" encoding, but we could use escaping from the JS example as well: you can check than runningfmt.Println("\býP8\x1eÜÙ*ë!Ó\x17çØê\x1a" == "\u0008ýP8\u001eÜÙ*ë!Ó\u0017çØê\u001a")prints
true.So, as you can see by now,
console.logreplaces each non-printable character with the special Unicode code point U+FFFD, which is called Unicode replacement character, usually rendered as a black rhombus with a white question mark in it.Go's
fmt.Printlndoes not do that: it merely sends these bytes "as is" to the output.Hope this explains the observed difference.