I am executing a web application in the container of servlets Tomcat 8.0. In a request i try transforming an input data, to XML with code below. The firts input data character is a unicode supplementary character U+16980 represented as the char pair \ud81a\udd80, and the second character is another supplementary character U+16990 represented as the char pair \ud81a\udd90.
String text = " � �";
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document document = documentBuilder.newDocument();
Element root = document.createElement("root");
document.appendChild(root);
Element node = document.createElement("sofa");
node.appendChild(document.createTextNode(text));
root.appendChild(node);
Source xmlSource = new DOMSource(document);
// create StreamResult for transformation result
javax.xml.transform.Result result = new StreamResult(new FileOutputStream("text.xml"));
// create TransformerFactory
TransformerFactory transformerFactory = TransformerFactory.newInstance();
// create Transformer for transformation
Transformer transformer = transformerFactory.newTransformer();
// transform and deliver content to client
transformer.transform(xmlSource, result);
I was expecting: <root><sofa>𖦀 𖦐 � �</sofa></root>
But instead I get: <root><sofa>�� �� � �</sofa>
</root>