Bad Characters when parsing GML in Java

532 views Asked by At

I'm using the org.w3c.dom package to parse the gml schemas (http://schemas.opengis.net/gml/3.1.0/base/).

When I parse the gmlBase.xsd schema and then save it back out, the quote characters around GeometryCollections in the BagType complex type come out converted to bad characters (See code below).

Is there something wrong with how I'm parsing or saving the xml, or is there something in the schema that is off?

Thanks,

Curtis

public static void main(String[] args) throws IOException
{
   File schemaFile = File.createTempFile("gml_", ".xsd");
   FileUtils.writeStringToFile(schemaFile, getSchema(new URL("http://schemas.opengis.net/gml/3.1.0/base/gmlBase.xsd")));
   System.out.println("wrote file: " + schemaFile.getAbsolutePath());
}

public static String getSchema(URL schemaURL)
{
    try
    {
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder db = dbf.newDocumentBuilder();
        Document doc = db.parse(new InputSource(new    StringReader(IOUtils.toString(schemaURL.openStream()))));
        Element rootElem = doc.getDocumentElement();
        rootElem.normalize();

        TransformerFactory tFactory = TransformerFactory.newInstance();
        Transformer transformer = tFactory.newTransformer();

        DOMSource source = new DOMSource(doc);
        ByteArrayOutputStream xmlOutStream = new ByteArrayOutputStream();
        StreamResult result = new StreamResult(xmlOutStream);
        transformer.transform(source, result);
        return xmlOutStream.toString();
    }
    catch (Exception e)
    {
        e.printStackTrace();
    }

    return "";
}
1

There are 1 answers

6
Jon Skeet On

I'm suspicious of this line:

Document doc = db.parse(new InputSource(
     new StringReader(IOUtils.toString(schemaURL.openStream()))));

I don't know what IOUtils.toString does here but presumably it's assuming a particular encoding, without taking account of the XML declaration.

Why not just use:

Document doc = db.parse(schemaURL.openStream());

Likewise your FileUtils.writeStringToFile doesn't appear to specify a character encoding... which encoding does it use, and why encoding is in the StreamResult?