I am trying to read and write unsigned char (0 - 255) extended ASCII characters (unicode) from and to console under windows (cross platform compatibility is needed) in C.
Under extended ASCII (unicode), code-point 255 is ÿ
and code-point 220 is Ü
.
Right now I have the following code for writing and reading.
#include<stdio.h>
#include<locale.h>
int main() {
setlocale(LC_ALL, "");
unsigned char ch = 255;
wprintf(L"Character %d = %lc\n", ch, ch);
wprintf(L"Enter a character: ");
wscanf(L"%lc", &ch);
wprintf(L"Character %d = %lc\n", ch, ch);
return 0;
}
The output is:
Character 255 = ÿ
Enter a character: ÿ
Character 220 = Ü
As evident, code-point 255 is displayed properly as ÿ
.
However, when taking ÿ
as input, it is being read as code-point 220.
Consequently, when code-point 220 is printed, it is displayed as Ü
.
Thus, the writing is working fine. However, while reading, when the ASCII characters are above 127 (128 - 255), the read code-point is 36 less than the actual value.
Can you please help me understand what I am doing wrong and how I can fix this.
%lc
takes a wide characterwchar_t
, wide refers to it being multi-byte, but the exact size is implementation specific. Giving it a 1 byteunsigned char
will cause odd behavior as it will read a byte or two extra.But if you're using 1 byte characters you don't need to use wprintf nor wscanf. Just use
printf
andscanf
.And, as noted by others, "extended ASCII" is not "Unicode". See this question for more.