Java Xml Transformation escapes surrogates code units that represents supplementary characters

360 views Asked by At

I am executing a web application in the container of servlets Tomcat 8.0. In a request i try transforming an input data, to XML with code below. The firts input data character is a unicode supplementary character U+16980 represented as the char pair \ud81a\udd80, and the second character is another supplementary character U+16990 represented as the char pair \ud81a\udd90.

    String text = "    �  �";
    DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
    Document document = documentBuilder.newDocument();

    Element root = document.createElement("root");
    document.appendChild(root);
    Element node = document.createElement("sofa");

    node.appendChild(document.createTextNode(text));

    root.appendChild(node);

    Source xmlSource = new DOMSource(document);

    // create StreamResult for transformation result
    javax.xml.transform.Result result = new StreamResult(new FileOutputStream("text.xml"));

    // create TransformerFactory
    TransformerFactory transformerFactory = TransformerFactory.newInstance();

    // create Transformer for transformation
    Transformer transformer = transformerFactory.newTransformer();

    // transform and deliver content to client
    transformer.transform(xmlSource, result);

I was expecting: <root><sofa>&#92544; &#92560; � �</sofa></root>

But instead I get: <root><sofa>&#55322;&#56704; &#55322;&#56720; � �</sofa> </root>

0

There are 0 answers