Java: BufferedReader Keeps Writing Values 128-159 as 63 When Converting to Char

450 views Asked by At

I am trying to write a hex editor. I'm trying to store values by writing a char to a text file. For some reason every decimal number 128-159 is being written or read (not sure which) as 63. I took measures to isolate the problem. Here's an example of it happening:

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.io.File;

public class Why {

    public static File file = new File("why.txt");

    public static void main(String[] args) throws IOException {
        if(!file.exists())
            file.createNewFile();

        BufferedWriter bw = new BufferedWriter(new FileWriter(file));
        bw.write((char) 144);
        bw.close();

        BufferedReader br = new BufferedReader(new FileReader(file));
        System.out.println(br.read());
        br.close();
    }
}

Any help is appreciated.

I figured it out using a FileOutputStream and FileInputStream. Thanks all.

2

There are 2 answers

0
Jon Skeet On BEST ANSWER

When you use FileReader and FileWriter, they will use the default encoding for your platform. That's almost always a bad idea.

In your case, it seems that that encoding doesn't support U+0092, which is fairly reasonable given that it's a private use character - many encodings won't support that. I suspect you don't actually want (char) 144 at all. If you really, really want to use that character, you should use an encoding which can encode all of Unicode - I'd recommend UTF-8.

It's important to differentiate between text and binary, however - if you're really just interested in bytes, then you shouldn't use a reader or writer at all - use an InputStream and an OutputStream. Hex editors are typically byte-oriented rather than text-oriented, although they may provide a text view as well (ideally with configurable encoding). If you want to know the exact bytes in the file, you should definitely be using FileInputStream.

3
Kayaman On

Character 63 is ? which means that you're using an encoding that doesn't support the character that you're attempting to write (and replacing it with ?).

This is the part where you should stop with your hex editor for a while and learn the magical (and terrible) world of character encodings, and why you can't ignore them.

Here's a great read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) and it's still as valid as back in 2003.