Extended Ascii doesn't work in console!

8.4k views Asked by At

For example System.out.println("╚"); displays as a ?, same goes for System.out.println("\u255a");

Why doesn't this work? Stdout does indeed support these characters so I don't get it.

3

There are 3 answers

0
Josh Lee On BEST ANSWER

See this question. When Java’s default character encoding is not UTF-8 — as is the case, it seems, on Windows and OS X, but not Linux — then characters which fail to encode are converted to question marks. You can pass the correct switch (-Dfile.encoding=UTF-8 on some terminals, but I don’t have a Windows box in front of me) to the JVM’s command line, or you can set an environment variable. Portably determining what this should be might be impossible, but if you know that you will always run on the Win32 console, for example, you can choose a Charset to explicitly encode the characters before writing them to standard output, or you can directly write the bytes you need.

0
McDowell On

The Windows command prompt uses old DOS OEM encodings by default. System.out uses the default system encoding, which will be a Windows "ANSI" encoding. However, System.console() detects the encoding of the console.

U+255A (╚) is more likely to be supported by the OEM codepages as these ranges were used for accented characters in Windows.

You can read more here, here, here and here.

Personally, I would avoid the -Dfile.encoding option with codepage 65001 as this produces unintended side-effects in both the console (batch files stop working) and Java (bugs).

0
hippietrail On

In case you are using Windows, the console is not UTF-8 but UTF-16 which is the same native encoding that Java uses, therefore you should be able to print wide character strings directly.

I'm not a Java programmer but in the case of C you have to call _setmode() with the special mode _O_U16TEXT before printing UTF-16 will actually work.

If you want to print multibyte character strings instead you can set the Windows console to UTF-8 from the commandline with chcp 65001 or programmatically from the Win32 API SetConsoleOutputCP() but beware a bug where WriteFile() returns the number of characters written instead of the number of bytes written as is documented. This bug causes UTF-8 on the Windows console to be corrupt from Perl, PHP and Ruby. I believe even MSVCRT even falls victim.

Good luck!