I am having trouble in understanding the character set for printing on the console in for a Windows C programme. I have not found any question answering this directly (if there should be one a link would be appreciated).
When looking through some different character sets (UCS-2, ISO 8859-1, Unicode) I always find the character 'ý' after the character 'ü'. When I then made a C programme to print the characters on a console, actually the character "superscript 2" follows 'ü' (sorry, don't know how to write the character suberscript here). In a visual studio debugging environment 'ý' is still shown to be following 'ü'.
My question is therefore: What character set is used by C to write on the console?
those characters are the iso-latin-1 versions of the some of the extended iso-latin-1 characters, when encoded as utf-8. It can be due to two causes:
\u0080
...\u002f
, is printed as two characters) and your terminal doesn't support utf-8 output.It depends. To support multibyte characters you need to do several things in C. I assume you have done nothing special but to use the normal functions of C, which normally assume you are using 7bit ASCII characters, and the locale is set to
C
(this is no locale at all):setlocale(3)
.wchar_t
versions of all routines that are going to use the typewchar_t
(this type supports character sets of more than 256 characters, like Unicode)You need to educate yourself, as from that point on,
strlen()
for example, will not be the routine to calculate a string length (as it justs count the number of bytes of the passed string ---which ischar
related, and notwchar_t
related) so you need to usemblen(3)
instead (be very careful at the function prototypes, as some functions take awchar_t *
string, while others take achar *
string).Check the manual pages for routines like:
scscoll(3)
,strcoll(3)
,strxfrm(3)
,wcsxfrm(3)
,wprintf(3)
,fwprintf(3)
,swprintf(3)
,vfwprintf(3)
,fwide(3)
,...I wrote a small version of the
cal(1)
command, and internationalized it to support foreign locales and complete international support (this includes the use of wide chars) You can get it here to see the complete thing to use a program that shows its output in the language you have configured for your session.See also the manual page for the
locale(1)
command, to check the locale you have configured for your account.