If I write:
char a = 'A';
printf("%x %c", a, a);
it will produce the output "41 A". Similary when I write
char32_t c = U'';
printf("%x %c", c, c); //even tried %lc and %llc
it will produce the output "1f34c L" instead of expected "1f34c "!
Is there something wrong here? How can I print char16_t and char32_t characters onto stdout?
Also, which format specifier should I use to get char16_t / char32_t input from scanf?
char32_t c;
scanf("%c", &c); //
printf("%x %c", c, c);
this will produce the output "f0 �".
char16_t
andchar32_t
are nothing special. They are really justuint_least16_t
anduint_least32_t
. They do not have that great support. The only thing they are used for are basicallyu
andU
literals. They may not be UTF-16 and UTF-32 - check__STDC_UTF_16__
and__STDC_UTF_32__
macros before assuming they are. Only very basic conversion functions are in standard. In the standard there are only functions to convertchar16_t
orchar32_t
into multibyte encoding, and back. To do anything more with them, you have to implement it yourself.C language has really two encodings - locale dependent multibyte character representation and wide character representation.
The
''
character you typed in your source file is interpreted by the compiler as a some implementation specific value. Gcc would makean UTF-8, then gcc preprocessor will shift the values left, so
''
is equal to(int)0xF09F8D8C
on gcc - the behavior of multi-character literals'something'
is implementation defined. Then the value of that character is assigned tochar32_t
. That is not at all an UTF-32 value.Convert them to multibyte string. Then just print it with
%s
.Printing data is locale dependent, as printing is done in the locale specified by the user. The default locale is
C
and has no UTF support. So first you have to set your locale to something utf compatible. Then callc32rtomb
. Note that stream chooses encoding at the first time it's printed inglibc
- make sure to callsetlocale
before doing anything with the stream you want to work with.None, there is none. You should use
wchar_t
or plainchar
strings to read characters from user in the encoding specified in his locale. Then you can convert to/fromchar16_t
andchar32_t
if you want. If you want to specifically read UTF-32 characters, then you have to write it yourself to be sure your code readsUTF-32
characters. I recommend libunistring.