Lets create an XML file with two attribute values witch contain an extended unicode char
XMLOutputFactory outputFactory = XMLOutputFactory.newInstance();
try (BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(ERROR_XML), "UTF-8"))) {
XMLStreamWriter xmlStreamWriter = outputFactory.createXMLStreamWriter(writer);
xmlStreamWriter.writeStartDocument();
xmlStreamWriter.writeCharacters("\n");
xmlStreamWriter.writeStartElement("start");
xmlStreamWriter.writeAttribute("test1", "11");
xmlStreamWriter.writeAttribute("test2", "22");
xmlStreamWriter.writeEndElement();
xmlStreamWriter.writeEndDocument();
}
The generated file looks like this:
<?xml version="1.0" ?>
<start test1="11" test2="22"></start>
If this is read in again and the attribute values examined
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
try (BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(ERROR_XML), "UTF-8"))) {
XMLStreamReader xmlStreamReader = inputFactory.createXMLStreamReader(reader);
xmlStreamReader.nextTag();
if (XMLStreamReader.START_ELEMENT == xmlStreamReader.getEventType() &&
"start".equals(xmlStreamReader.getLocalName()))
{
System.out.println(xmlStreamReader.getAttributeValue(0));
System.out.println(xmlStreamReader.getAttributeValue(1));
}}
this will print
11
22
Astonishingly the second attribute value contains the extended unicode char 2 times!
Any following use of an extended char as attribute value will increase this count. In one case I received attribute values with 12000 identical characters instead of one. What is happening here?
There is a bug in the Java API corresponding class.
You can use the "woodstox.jar" to do it correctly. All you need to do is to modifiy the code that reads the XML file as the following:
It will work correctly. I have tested my self.
You can find "woodstox.jar" in http://wiki.fasterxml.com/WoodstoxDownload.