Convert C++ string to a char array, while encoding it in UCS2 (or utf-8)

1k views Asked by At

Take the following string

std::string input = "Test ";

The UCS2 representation of this string in hex is

0x00 0x54 0x00 0x65 0x00 0x73 0x00 0x74 0x00 0x20 0xd8 0x3d 0xde 0x01

Because the input string needs additional characters to store the unicode representation of the smiley. Is there an easy way to convert this string into a char array with the values as above? I'm looking at the ICU library and boost locale, and while they seem to handle translating from 1 encoding to another I don't see any easily available APIs to convert from string to char*. Note that I say "UCS2" but my underlying c++17 string is actually in utf-8.

I could write my own api to start from the first character and keep pushing to a char* array byte by byte until I encounter a char who's ascii value is > 127, then restart the encoding with 2 bytes instead of the 1. I need to do a potential double pass because I don't know if the input string will have characters that cannot be coded in ansi.

But maybe there's a native way (or even a 3rd party lib) that does it already?

0

There are 0 answers