XML to Java parser: How to parse attributes presented within a CDATA tag

1k views Asked by At

I am currently extracting some data from a HP Quality Center SQL-database, and some of the data I need to configure the correct presentation of other data, is stored in XML-format. I have a basic understanding of XML, and have been able to parse most of the attributes, and make them into runtime objects that contain the necessary fields for further data retrieval. But I have not been able to extract the attributes inside a - area. The data inside is necessary to handle programmatically at runtime, due to important information about which tables to search, and which filters to apply.

I have a single class runnable example, that just gives a printline output for each field I have read into a java object, and it fails as soon as i try to extract the CDATA attributes.

I have read numerous articles about what CDATA is, but none of them seem to mention a similar setup, where the inside of a CDATA-section clearly contains attributes.

So, is it possible to extract these attributes in a similar way to how I extract the other attributes? If so, how?

Thanks in advance.

code (the xml-string is a hardcoded example from the database):

import java.io.ByteArrayInputStream;
import java.io.IOException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;

import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;


public class XMLParser {

    public static void main(String[] args){
        String xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
                "<AnalysisDefinition Version=\"2.0\" " +
                    "GraphProviderId=\"QC.Graph.Provider\" " +
                    "GroupByField=\"TC_STATUS\" " +
                    "ForceRefresh=\"False\" " +
                    "SelectedProjects=\"CURRENT-PROJECT-UID\" " +
                    "SumOfField=\"\" TimeResolution=\"Day\" " +
                    "DisplayOptions=\"Regular\">" +

                    "<Filter " +
                        "FilterState=\"Custom\" " +
                        "FilterFormat=\"Frec\">" +

                        "<![CDATA[[Filter]{" +
                            "TableName:TESTCYCL," +
                            "ColumnName:TC_ASSIGN_RCYC," +
                            "LogicalFilter:\\00000047\\^URLAnonymized^," +
                            "VisualFilter:\\00000047\\^URLAnonymized^," +
                            "NO_CASE:" +
                            "}" +
                            "]]>" +
                        "</Filter>" +

                        "<DateRange " +
                            "PeriodType=\"Custom\" " +
                            "StartDate=\"2013,9,29\" " +
                            "EndDate=\"2013,10,14\" " +
                        "/>" +
                    "</AnalysisDefinition>";

        AnalysisDefinition ad = createFilterData(xml);      

        System.out.println("displayOtions: " + ad.getDisplayOptions());
        System.out.println("graphProviderID: " + ad.getGraphProviderId());
        System.out.println("GroupByField: " + ad.getGroupByField());
        System.out.println("SumOfField: " + ad.getSumOfField());
        System.out.println("TimeResolution: " + ad.getTimeResolution());
        System.out.println("Version: " + ad.getVersion());

        System.out.println("Filter: " + ad.getFilter());
        System.out.println("DateRange: " + ad.getDateRange());

        System.out.println("FilterState: " + ad.getFilter().getFilterState());
        System.out.println("FilterFormat: " + ad.getFilter().getFilterFormat());
        System.out.println("TableName: " + ad.getFilter().getTableName());


    }

    public static AnalysisDefinition createFilterData(String xml){

        AnalysisDefinition ad = new AnalysisDefinition();

        DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
        docFactory.setNamespaceAware(true);
        docFactory.setValidating(false);
        docFactory.setIgnoringElementContentWhitespace(true);
        Document doc = null;
        try {
            DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
            ByteArrayInputStream is = new ByteArrayInputStream(xml.getBytes());
            doc = docBuilder.parse(is);

        } catch (ParserConfigurationException e) {
            e.printStackTrace();
        } catch (SAXException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

        NodeList nl = doc.getElementsByTagName("AnalysisDefinition");
        for(int i = 0, stop = nl.getLength(); i < stop; i++){
            Element e = (Element) nl.item(i);
            ad.setVersion(e.getAttribute("Version"));
            ad.setGraphProviderId(e.getAttribute("GraphProviderId"));
            ad.setGroupByField(e.getAttribute("GroupByField"));
            ad.setForceRefresh(Boolean.parseBoolean(e.getAttribute("ForceRefresh")));
            ad.setSumOfField(e.getAttribute("SumOfField"));
            ad.setTimeResolution(e.getAttribute("TimeResolution"));
            ad.setDisplayOptions(e.getAttribute("DisplayOptions"));
        }

        nl = doc.getElementsByTagName("Filter");
        for(int i = 0, stop = nl.getLength(); i < stop; i++){
            Element e = (Element) nl.item(i);
            Filter filter = new Filter();
            filter.setFilterState(e.getAttribute("FilterState"));
            filter.setFilterFormat(e.getAttribute("FilterFormat"));
            filter.setTableName(e.getAttribute("TableName"));

            ad.setFilter(filter);
        }   
        return ad;
    }
}
1

There are 1 answers

0
Michael Kay On

CDATA means "character data", i.e. text with no markup. There are therefore no attributes in your CDATA; only text that can be interpreted as attributes if you choose. By wrapping them in CDATA you've instructed the XML parser not to interpret them in any way. If you do know the syntax of the data inside a CDATA section, whether it's XML or something else like JSON, you'll have to pass the text inside the CDATA to an appropriate parser to extract the structure.