I am trying to read a UTF-16 xml file with Java. The file was written with C#.
Here's the java code:
import java.io.File;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class XMLReadTest
{
public static void main(String[] s)
{
try
{
File fXmlFile = new File("C:\\my_file.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
doc.getDocumentElement().normalize();
NodeList nList = doc.getElementsByTagName("row");
for (int temp = 0; temp < nList.getLength(); temp++)
{
Node nNode = nList.item(temp);
if (nNode.getNodeType() == Node.ELEMENT_NODE)
{
Element eElement = (Element) nNode;
System.out.println("FILE_NAME: " + eElement.getElementsByTagName("FILE_NAME").item(0).getTextContent());
}
}
}
catch(Exception ex)
{
ex.printStackTrace();
}
}
}
And here's the xml file:
<?xml version="1.0" encoding="utf-16" standalone="yes"?>
<docMetadata>
<row>
<FILE_NAME>Выписка_Винтовые насосы.pdf</FILE_NAME>
<FILE_CAT>GENERAL</FILE_CAT>
</row>
</docMetadata>
When running this code in eclipse and in the Run/Debug settings window, in the last tab named 'Common' the selected encoding is the Default - Inherited (Cp1253), the output I get is wrong:
FILE_NAME: ???????_???????? ??????.pdf
When the selecdted encoding in the same tab is UTF-8 then the output is OK:
FILE_NAME: Выписка_Винтовые насосы.pdf
What am I doing wrong?
How can I get the correct output with the default encoding (cp 1253) in eclipse project settings?
This code runs in a server where I don't want to change the default encoding of the virtual machine.
I have tested this code with both Java 7 and Java 8
I was using an old dom4j library to parse the xml and that was causing the problem. Using the JVM 1.7 embeded library solved the problem: