Can't Java XMLStreamReader have attribute values with higher Unicode planes?

Question

Can't Java XMLStreamReader have attribute values with higher Unicode planes?

851 views Asked by markus falkhausen At 11 September 2013 at 18:24

Lets create an XML file with two attribute values witch contain an extended unicode char

XMLOutputFactory outputFactory = XMLOutputFactory.newInstance();

try (BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(ERROR_XML), "UTF-8"))) {
XMLStreamWriter xmlStreamWriter = outputFactory.createXMLStreamWriter(writer);

xmlStreamWriter.writeStartDocument();
xmlStreamWriter.writeCharacters("\n");
xmlStreamWriter.writeStartElement("start");
xmlStreamWriter.writeAttribute("test1", "11");
xmlStreamWriter.writeAttribute("test2", "22");
xmlStreamWriter.writeEndElement();
xmlStreamWriter.writeEndDocument();
}

The generated file looks like this:

<?xml version="1.0" ?>
<start test1="11" test2="22"></start>

If this is read in again and the attribute values examined

XMLInputFactory inputFactory = XMLInputFactory.newInstance();
try (BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(ERROR_XML), "UTF-8"))) {
XMLStreamReader xmlStreamReader = inputFactory.createXMLStreamReader(reader);

xmlStreamReader.nextTag();
if (XMLStreamReader.START_ELEMENT == xmlStreamReader.getEventType() &&
    "start".equals(xmlStreamReader.getLocalName())) 
{
    System.out.println(xmlStreamReader.getAttributeValue(0));
    System.out.println(xmlStreamReader.getAttributeValue(1));
}}

this will print

11
22

Astonishingly the second attribute value contains the extended unicode char 2 times!

Any following use of an extended char as attribute value will increase this count. In one case I received attribute values with 12000 identical characters instead of one. What is happening here?

Original Q&A

There are 1 answers

**Tareq** · Answer 1 · 2015-05-21T09:06:05+00:00

There is a bug in the Java API corresponding class.

You can use the "woodstox.jar" to do it correctly. All you need to do is to modifiy the code that reads the XML file as the following:

XMLStreamReader2 instead of XMLStreamReader
XMLInputFactory2 instead of XMLInputFactory

It will work correctly. I have tested my self.

You can find "woodstox.jar" in http://wiki.fasterxml.com/WoodstoxDownload.

TechQA.

Can't Java XMLStreamReader have attribute values with higher Unicode planes?

There are 1 answers

Related Questions in JAVA

Related Questions in UNICODE

Related Questions in ATTRIBUTES

Related Questions in XMLSTREAMREADER

Popular Questions

Popular Tags

Trending Questions