I have 3rd party code which punycodes strings (escapes and unescapes). As Unicode input/output, it uses 32-bit Unicode strings (uint32_t-based), not 16-bit. My own input/output is BSTR (UTF 16-bit). How should I convert between 32-bit Unicode char array and BSTR (both directions)?
The code should work in Visual C++ 6.0 and later versions.
UTF16 is same as UTF32 for characters less than
0xFFFF
. You can use the following conversion to display UTF-32 codes in Windows.Note, this is based on Wikipedia UTF16 article. I didn't add any error checks, it expects valid codes.
For example the following code should display a smiley face in Windows 10:
Edit:
Obtaining UTF-16 from array of UTF-32 code points, and the reverse operation:
UTF-16 string can be one
wchar_t
character long (2 bytes per code point), or 2wchar_t
characters joined together (4 bytes per code point). If the first character is between0xD800
and0xE000
that indicates 4 bytes per code point.Example: