I found this interesting problem last week. Run the program below. It's very simple, first create a dummy xml file, and read it with standard lib and write it back to a file.
Look through the generated gtest2.xml, you will see that it has some content that were come out of nowhere.
In my case, this is the sample of wrong section (the place vary on different machine).
<test>1924</test>
<test>1925</test>
<test>t>24</test>
<test>1927</test>
<test>1928</test>
<test>1929</test>
This does not happen if I changed my xml version to 1.0. So something wrong with my code or jdk?
Here is the test code:
import java.io.File;
import java.io.PrintWriter;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
public class DocumentBuilderCheck {
public static void main(String[] args) throws Exception {
String filename = "/tmp/gtest.xml";
generateXmlFile(filename, 2500);
Document doc = readXmlFile(filename);
String filename2 = "/tmp/gtest2.xml";
writeDocument(doc, filename2);
}
private static void writeDocument(Document document, String filename) throws Exception {
StreamResult streamResult = new StreamResult(filename);
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.METHOD, "xml");
transformer.transform(new DOMSource(document), streamResult);
}
private static Document readXmlFile(String filename) throws Exception {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new File(filename));
return doc;
}
private static void generateXmlFile(String filename, int total)
throws Exception {
File f = new File(filename);
PrintWriter pw = new PrintWriter(f);
pw.write("<?xml version=\"1.1\" encoding=\"UTF-8\"?>");
pw.write("<main_tag>");
for (int i = 0; i < total; i++) {
pw.write("<test>" + String.format("%04d", i) + "</test>");
}
pw.write("</main_tag>");
pw.close();
}
}
I don't know what gives, but one well-known (?) problem with JDK is that it often includes old version of libraries such as Xerces (XML parser) and Xalan (XSLT processor). Worse, sometimes these are custom versions using old version as baseline, and some set of patches, so it is hard to even verify what to expect.
As a result, recommendation is not to rely on whatever is bundled but instead explicitly use official Xerces/Xalas versions to ensure that version used is known and you can at least check what known issues exist.
So maybe you can use latest Xerces and Xalan versions to ensure it's not something that has been fixed earlier.