I'm reading an xml-file which contains german, french, spanish, english and polish text.
To handle the polish letters (which caused the most trouble) i tried to do it like this:
File file = new File(path);
InputStream is = new FileInputStream(file);
Reader reader = new InputStreamReader(is, charset);
InputSource src = new InputSource(reader);
src.setEncoding(charset.name());
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
saxParser.parse(src, handler);
The problem i encountered was that none of the default charsets display the text properly. Some have questionmarks in it some have a combination of other chars in it e.g. ÄÖ..
To break it a bit down I wrote another snippet to test which charset works:
public static void main(String[] args){
Charset charset = StandardCharsets.UTF_8;
String chars = "śłuna długie";
System.out.println(new String(chars.getBytes(charset), charset));
}
Again tested every single one but nothing works.. I hope you've got an idea.
My solution: Change the encoding of your ide
I used the default encoding of my ide (intellij) which was "windows-1252", due to the fact that I'm using windows on this pc.
So I changed it to UTF-8 and the short test code worked fine for me.