Wrong default system encoding detected in Java application

335 views Asked by At

What is the problem?

I've noticed a strange problem with Java showing different default file encodings while running at the same machine and OS (Windows 10). If I run my Gradle application from a console, Charset.defaultCharset() shows Windows-1250. When I run it from IntelliJ (also as Gradle app) it shows Windows-1252.

It is even more strange when I run it on a different computer with Windows 11 - the results are quite opposite, Windows-1252 while running from a console and Windows-1250 in IntelliJ.

The correct system encoding for my OS (Polish version of Win 10/11) should always be Windows-1250 as far as I know.

I use AdoptOpenJDK 16, Gradle 7.0 and IJ 2021.3.2.

Why is it important in my case?

My Java application executes external Python scripts and communicates with Python processes created by ProcessBuilder via Process.getInput/OutputStream(). When I send some data with non-ascii characters through that stream, they are replaced with ? and read as such on the Python side. For example, on Java side I am sending a line like this:

try (var inputWriter = new BufferedWriter(new OutputStreamWriter(scriptProcess.getOutputStream()))) {
    inputWriter.write("Właściciel");
}

and on the Python side I am receiving this data like this:

inputBuffer = []
for line in stdin:
    inputBuffer.append(line.rstrip())

When I print inputBuffer or write it to a file, it shows W?a?ciciel. It's worth noting that this behavior doesn't depend on the encoding of the input string itself - "Właściciel" can be read from UTF-8 or Windows-1250 or Windows-1252 file and the problem remains the same.

If I force a correct encoding by adding it as a Writer's parametr:

var writer = new OutputStreamWriter(scriptProcess.getOutputStream(), "Windows-1250")

..then it works ok, question marks disappear. But I feel hardcoding "system encoding" is not a good solution, because it will collapse if someone runs my app on Windows with other regional settings (e.g. with the English language, where default encoding is UTF-8).

So my question is: is there another way to determine valid system encoding or to create communication between processes that is independent of system encoding/region settings?

0

There are 0 answers