Can't Java XMLStreamReader have attribute values with higher Unicode planes?

780 views Asked by At

Lets create an XML file with two attribute values witch contain an extended unicode char

XMLOutputFactory outputFactory = XMLOutputFactory.newInstance();

try (BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(ERROR_XML), "UTF-8"))) {
XMLStreamWriter xmlStreamWriter = outputFactory.createXMLStreamWriter(writer);

xmlStreamWriter.writeStartDocument();
xmlStreamWriter.writeCharacters("\n");
xmlStreamWriter.writeStartElement("start");
xmlStreamWriter.writeAttribute("test1", "11");
xmlStreamWriter.writeAttribute("test2", "22");
xmlStreamWriter.writeEndElement();
xmlStreamWriter.writeEndDocument();
}

The generated file looks like this:

<?xml version="1.0" ?>
<start test1="11" test2="22"></start>

If this is read in again and the attribute values examined

XMLInputFactory inputFactory = XMLInputFactory.newInstance();
try (BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(ERROR_XML), "UTF-8"))) {
XMLStreamReader xmlStreamReader = inputFactory.createXMLStreamReader(reader);

xmlStreamReader.nextTag();
if (XMLStreamReader.START_ELEMENT == xmlStreamReader.getEventType() &&
    "start".equals(xmlStreamReader.getLocalName())) 
{
    System.out.println(xmlStreamReader.getAttributeValue(0));
    System.out.println(xmlStreamReader.getAttributeValue(1));
}}

this will print

11
22

Astonishingly the second attribute value contains the extended unicode char 2 times!

Any following use of an extended char as attribute value will increase this count. In one case I received attribute values with 12000 identical characters instead of one. What is happening here?

1

There are 1 answers

0
Tareq On

There is a bug in the Java API corresponding class.

You can use the "woodstox.jar" to do it correctly. All you need to do is to modifiy the code that reads the XML file as the following:

  • XMLStreamReader2 instead of XMLStreamReader
  • XMLInputFactory2 instead of XMLInputFactory

It will work correctly. I have tested my self.

You can find "woodstox.jar" in http://wiki.fasterxml.com/WoodstoxDownload.