If I write:
char a = 'A';
printf("%x %c", a, a);
it will produce the output "41 A". Similary when I write
char32_t c = U'';
printf("%x %c", c, c); //even tried %lc and %llc
it will produce the output "1f34c L" instead of expected "1f34c "!
Is there something wrong here? How can I print char16_t and char32_t characters onto stdout?
Also, which format specifier should I use to get char16_t / char32_t input from scanf?
char32_t c;
scanf("%c", &c); //
printf("%x %c", c, c);
this will produce the output "f0 �".
char16_tandchar32_tare nothing special. They are really justuint_least16_tanduint_least32_t. They do not have that great support. The only thing they are used for are basicallyuandUliterals. They may not be UTF-16 and UTF-32 - check__STDC_UTF_16__and__STDC_UTF_32__macros before assuming they are. Only very basic conversion functions are in standard. In the standard there are only functions to convertchar16_torchar32_tinto multibyte encoding, and back. To do anything more with them, you have to implement it yourself.C language has really two encodings - locale dependent multibyte character representation and wide character representation.
The
''character you typed in your source file is interpreted by the compiler as a some implementation specific value. Gcc would makean UTF-8, then gcc preprocessor will shift the values left, so''is equal to(int)0xF09F8D8Con gcc - the behavior of multi-character literals'something'is implementation defined. Then the value of that character is assigned tochar32_t. That is not at all an UTF-32 value.Convert them to multibyte string. Then just print it with
%s.Printing data is locale dependent, as printing is done in the locale specified by the user. The default locale is
Cand has no UTF support. So first you have to set your locale to something utf compatible. Then callc32rtomb. Note that stream chooses encoding at the first time it's printed inglibc- make sure to callsetlocalebefore doing anything with the stream you want to work with.None, there is none. You should use
wchar_tor plaincharstrings to read characters from user in the encoding specified in his locale. Then you can convert to/fromchar16_tandchar32_tif you want. If you want to specifically read UTF-32 characters, then you have to write it yourself to be sure your code readsUTF-32characters. I recommend libunistring.