How to parse nested XML with XSD and create a structured List of Hashmaps in Java

63 views Asked by At

I'm working with Java to process XML data. My goal is to transform a nested XML file (with an associated XSD) into a List of Hashmaps where each HashMap represents a row of data, with XML tag names as column names and values as the corresponding data.

I've attempted to process the XML using Java's DOM parser (DocumentBuilderFactory and DocumentBuilder).

My code iterates through all nodes (nodeList) in the XML document. For each element node, I check and store any attributes. Then, I iterate through child nodes, extracting the tag name and text content to build a HashMap (currentRecord) representing a single data row. I maintain a recordList to store the completed HashMaps.

    for (int i = 0; i < nodeList.getLength(); i++) { 
        Node currentNode = nodeList.item(i); 

        if (currentNode.getNodeType() == Node.ELEMENT_NODE) { 
        Element currentElement = (Element) currentNode; 

        // Check if there are attributes, if yes, add them to currentRecord map 
        NamedNodeMap attributes = currentElement.getAttributes(); 
        for (int k = 0; k < attributes.getLength(); k++) { 
            Node attribute = attributes.item(k); 
            if (attribute.getNodeValue() != null && !attribute.getNodeName().equals("xmlns:xsi")) { 
                currentRecord.put(attribute.getNodeName(), attribute.getNodeValue()); 
            } 
        } 

        NodeList childNodes = currentElement.getChildNodes(); 
        for (int j = 0; j < childNodes.getLength(); j++) { 
            Node childNode = childNodes.item(j); 
            if (childNode.getNodeType() == Node.ELEMENT_NODE) { 
                Element childElement = (Element) childNode; 
                String nodeName = childElement.getNodeName(); 
                String nodeValue = childElement.getTextContent(); 

                // Replace the same tag value. 
                if (nodeValue != null && !nodeValue.contains("\n")) { 
                    currentRecord.put(nodeName, nodeValue); 
                } 
            } 
        } 
        recordList.add(new HashMap<>(currentRecord)); 
    } 
    } 


XML Snippet:
<?xml version="1.0" encoding="UTF-8"?>
<Banks>
    <Bank>
        <Name>Goldman Sachs</Name>
        <Location>London</Location>
        <Clients>
            <Client>
                <ClientName>John Doe</ClientName>
                <AccountType>Savings</AccountType>
                <Services>
                    <Service>
                        <Type>Online Banking</Type>
                        <Status>Active</Status>
                    </Service>
                    <Service>
                        <Type>Investment</Type>
                        <Status>Inactive</Status>
                    </Service>
                </Services>
            </Client>
            <Client>
                <ClientName>Jane Smith</ClientName>
                <AccountType>Checking</AccountType>
                </Client>
            </Clients>
    </Bank>
</Banks>

Expected list of Hashmap example:
[
{
    "Name": "Goldman Sachs",
    "Location": "London",
    "ClientName": "John Doe",
    "AccountType": "Savings",
    "Type": "Online Banking",
    "Status": "Active"
}


{
    "Name": "Goldman Sachs",
    "Location": "London",
    "ClientName": "John Doe",
    "AccountType": "Savings",
    "Type": "Investment",
    "Status": "Inactive"
}
]
1

There are 1 answers

2
Michael Kay On

It looks like you're trying to do a flattening of the XML into simple non-normalised records, followed by conversion of those records to Java maps.

That kind of flattening is only possible if (as in your example) you have a hierarchy where no element is the owner of multiple one-to-many relationships. For example a structure like

<book>
  <author>John</author>
  <author>Jane</author>
  <editor>Mary</editor>
  <editor>Mark</editor>
</book>

cannot be flattened using this kind of approach. This is relevant because you seem to be writing code that is supposed to work with any XML vocabulary, and you have failed to specify what it is supposed to do in the general case.

You need to think a little bit harder about your requirements before trying to write code.