Recieving 65533 as char value for characters as (à, Ø, æ, æ etc)!

13.2k views Asked by At

I've been trying for hours now to figure out why, when enter a char like Ø in the console through the scanner, to then get the numeric value, I always end up with 65533 (Max value of unsigned short)?

This doesn't seem to be the case for latin characters. Any idea why?

public static void main(String[] args) {

    Scanner sc = new Scanner(System.in);
    char[] chars = sc.next().toCharArray();

    for(int i = 0; i < chars.length; i++){

        System.out.println((int)chars[i]);
    }
}
2

There are 2 answers

2
Alohci On

65533 = Unicode Character 'REPLACEMENT CHARACTER' (U+FFFD)

i.e. Your character is not being interpreted correctly within the character encoding you are using, and so is being replaced by the fallback value.

4
Vampire On

You have an encoding problem.
The bytes that come through System.in are not in the encoding your Scanner uses to translate those bytes to characters.
I guess your System.in is in Cp1252 (Windows default encoding) but your Scanner users UTF-8 to decode the bytes.
The byte sequence then is not a valid UTF-8 character and so the replacement character is used instead.

If you do Scanner sc = new Scanner(System.in, System.getProperty("file.encoding"));, your code should probably work everywhere correctly.