Pretty-printing XML using Java

1.6k views Asked by At

There are a dozen threads regarding that topic, but all of them contain answers that do not work for me in a satisfactory manner. It seems one needs to use a specific DOM implementation. However, I cannot get it to read the xml input:

@Test
public void testPrettyPrintConvertDomLevel3() throws UnsupportedEncodingException {
    String unformattedXml
            = "<?xml version=\"1.0\" encoding=\"UTF-16\"?><QueryMessage\n"
            + "        xmlns=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message\"\n"
            + "        xmlns:query=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/query\">\n"
            + "    <Query>\n"
            + "        <query:CategorySchemeWhere>\n"
            + "   \t\t\t\t\t         <query:AgencyID>ECB\n\n\n\n</query:AgencyID>\n"
            + "        </query:CategorySchemeWhere>\n"
            + "    </Query>\n\n\n\n\n"
            + "</QueryMessage>";

    System.out.println(prettyPrintWithXercesDomLevel3(unformattedXml.getBytes("UTF-16")));
}

Here is the method:

public static String prettyPrintWithXercesDomLevel3(byte[] input) {
    try {
//System.setProperty(DOMImplementationRegistry.PROPERTY,"org.apache.xerces.dom.DOMImplementationSourceImpl");
        DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
        DOMImplementationLS impl = (DOMImplementationLS) registry.getDOMImplementation("XML 3.0 LS 3.0");
        if (impl == null) {
            throw new RuntimeException("No DOMImplementation found !");
        }

        log.info(String.format("DOMImplementationLS: %s", impl.getClass().getName()));

        LSParser parser = impl.createLSParser(
                DOMImplementationLS.MODE_SYNCHRONOUS,
                //"http://www.w3.org/2001/XMLSchema");
                "http://www.w3.org/TR/REC-xml");
        log.info(String.format("LSParser: %s", parser.getClass().getName()));
        LSInput lsi = impl.createLSInput();
        lsi.setByteStream(new ByteArrayInputStream(input));
        Document doc = parser.parse(lsi);

        LSSerializer serializer = impl.createLSSerializer();
        serializer.getDomConfig().setParameter("format-pretty-print",Boolean.TRUE);
        LSOutput output = impl.createLSOutput();
        output.setEncoding("UTF-8");
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        output.setByteStream(baos);
        serializer.write(doc, output);
        return baos.toString();
//            return serializer.writeToString(doc);
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

However, the pretty-printing does not work. Any ideas?

3

There are 3 answers

1
markbernard On

The encoding of your Java source file must also match what you are trying to run with. If you are using Eclipse the default encoding is CP-1252 for some reason. The first thing I do when I put in a new version of Eclipse is change the file encoding to UTF-8.

I used your code and it worked fine since my source file encoding was UTF-8.

1
Lahiru Rajeew Ananda On

import java.io.StringReader;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Node;
import org.w3c.dom.bootstrap.DOMImplementationRegistry;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSSerializer;
import org.xml.sax.InputSource;

/**
 *
 * @author lananda
 */
public class PrettyXmlWriter {
    
     public static void main(String... args){
        String unformattedXml
                = "<?xml version=\"1.0\" encoding=\"UTF-16\"?>"
                + "<QueryMessage\n"
                + "        xmlns=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message\"\n"
                + "        xmlns:query=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/query\">\n"
                + "    <Query>\n"
                + "        <query:CategorySchemeWhere>\n"
                + "   \t\t\t\t\t         <query:AgencyID>ECB\n\n\n\n</query:AgencyID>\n"
                + "        </query:CategorySchemeWhere>\n"
                + "    </Query>\n\n\n\n\n"
                + "</QueryMessage>";
        unformattedXml = unformattedXml.replaceAll("\\s+", " ");
        String format = format(unformattedXml);
        System.out.println(format);
    }

       public static String format(String xml) {
        try {
            final InputSource src = new InputSource(new StringReader(xml));
            final Node document = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(src).getDocumentElement();
            final Boolean keepDeclaration = Boolean.valueOf(xml.startsWith("<?xml"));

        //May need this: System.setProperty(DOMImplementationRegistry.PROPERTY,"com.sun.org.apache.xerces.internal.dom.DOMImplementationSourceImpl");
            final DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
            final DOMImplementationLS impl = (DOMImplementationLS) registry.getDOMImplementation("LS");
            final LSSerializer writer = impl.createLSSerializer();
            writer.getDomConfig().setParameter("format-pretty-print", Boolean.TRUE); // Set this to true if the output needs to be beautified.
            writer.getDomConfig().setParameter("xml-declaration", keepDeclaration); // Set this to true if the declaration is needed to be outputted.
            return writer.writeToString(document);
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }
}

0
user1050755 On

Update: it seems that all whitespace is significant in XML: "Based on the W3C XML specification, the Oracle XML Developer's Kit (XDK) XML parsers, by default, preserves all whitespace.". Therefore it is quite reasonable NOT to make that feature part of a public API. org.jdom2 provides a reasonable implementation:

@Test
public void testPrettyPrintConvertDomLevel3() throws UnsupportedEncodingException, JDOMException, IOException {
    String unformattedXml
            = "<?xml version=\"1.0\" encoding=\"UTF-16\"?><QueryMessage\n"
            + "        xmlns=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message\"\n"
            + "        xmlns:query=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/query\">\n"
            + "    <Query>\n"
            + "        <query:CategorySchemeWhere>\n"
            + "   \t\t\t\t\t         <query:AgencyID>ECB \n </query:AgencyID>\n"
            + "        </query:CategorySchemeWhere>\n"
            + "    </Query>\n\n\n\n\n"
            + "</QueryMessage>";
    SAXBuilder builder = new SAXBuilder();
    Document doc = builder.build(new ByteArrayInputStream(unformattedXml.getBytes("UTF-16")));
    Format f = Format.getPrettyFormat();
    f.setLineSeparator(LineSeparator.NL);
    f.setTextMode(Format.TextMode.TRIM_FULL_WHITE);
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    new XMLOutputter(f).output(doc, baos);
    assertEquals("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"
            + "<QueryMessage xmlns=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message\" xmlns:query=\"http://www.SDMX.org/resources/SDMXML/schemas/v2_0/query\">\n"
            + "  <Query>\n"
            + "    <query:CategorySchemeWhere>\n"
            + "      <query:AgencyID>ECB \n"
            + " </query:AgencyID>\n"
            + "    </query:CategorySchemeWhere>\n"
            + "  </Query>\n"
            + "</QueryMessage>\n", baos.toString());
}