There are a dozen threads regarding that topic, but all of them contain answers that do not work for me in a satisfactory manner. It seems one needs to use a specific DOM implementation. However, I cannot get it to read the xml input:
@Test
public void testPrettyPrintConvertDomLevel3() throws UnsupportedEncodingException {
String unformattedXml
= "<?xml version=\"1.0\" encoding=\"UTF-16\"?><QueryMessage\n"
+ " xmlns=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message\"\n"
+ " xmlns:query=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/query\">\n"
+ " <Query>\n"
+ " <query:CategorySchemeWhere>\n"
+ " \t\t\t\t\t <query:AgencyID>ECB\n\n\n\n</query:AgencyID>\n"
+ " </query:CategorySchemeWhere>\n"
+ " </Query>\n\n\n\n\n"
+ "</QueryMessage>";
System.out.println(prettyPrintWithXercesDomLevel3(unformattedXml.getBytes("UTF-16")));
}
Here is the method:
public static String prettyPrintWithXercesDomLevel3(byte[] input) {
try {
//System.setProperty(DOMImplementationRegistry.PROPERTY,"org.apache.xerces.dom.DOMImplementationSourceImpl");
DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
DOMImplementationLS impl = (DOMImplementationLS) registry.getDOMImplementation("XML 3.0 LS 3.0");
if (impl == null) {
throw new RuntimeException("No DOMImplementation found !");
}
log.info(String.format("DOMImplementationLS: %s", impl.getClass().getName()));
LSParser parser = impl.createLSParser(
DOMImplementationLS.MODE_SYNCHRONOUS,
//"http://www.w3.org/2001/XMLSchema");
"http://www.w3.org/TR/REC-xml");
log.info(String.format("LSParser: %s", parser.getClass().getName()));
LSInput lsi = impl.createLSInput();
lsi.setByteStream(new ByteArrayInputStream(input));
Document doc = parser.parse(lsi);
LSSerializer serializer = impl.createLSSerializer();
serializer.getDomConfig().setParameter("format-pretty-print",Boolean.TRUE);
LSOutput output = impl.createLSOutput();
output.setEncoding("UTF-8");
ByteArrayOutputStream baos = new ByteArrayOutputStream();
output.setByteStream(baos);
serializer.write(doc, output);
return baos.toString();
// return serializer.writeToString(doc);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
However, the pretty-printing does not work. Any ideas?
The encoding of your Java source file must also match what you are trying to run with. If you are using Eclipse the default encoding is CP-1252 for some reason. The first thing I do when I put in a new version of Eclipse is change the file encoding to UTF-8.
I used your code and it worked fine since my source file encoding was UTF-8.