Efficiently unmarshaling a part of a large xml file with JAXB and XMLStreamReader

1.1k views Asked by At

I want to unmarshall part of a large XML file. There exists solution of this already, but I want to improve it for my own implementation.

Please have a look at the following code: (source)

public static void main(String[] args) throws Exception {
        XMLInputFactory xif = XMLInputFactory.newFactory();
        StreamSource xml = new StreamSource("input.xml");
        XMLStreamReader xsr = xif.createXMLStreamReader(xml);
        xsr.nextTag();

      while(!xsr.getLocalName().equals("VersionList")&&xsr.getElementText().equals("1.81")) {
            xsr.nextTag();
        }

I want to unmarshall the input.xml (given below) for the node: versionNumber="1.81"

With the current code, the XMLStreamReader will first check the node versionNumber="1.80" and then it will check all sub nodes of versionNumber and then it will again move to node: versionNumber="1.81", where it will satisfy the exit condition of the while loop.

Since, I want to check node versionNumber only, iterating its subnodes are unnecessary and for large xml file, iterating all sub nodes of version 1.80 will take lone time. I want to check only root nodes (versionNumber) and if the first root node (versionNumber=1.80) is not matched, the XMLStreamReader should directly jump to next root node ((versionNumber=1.81)). But it seems not achievable with xsr.nextTag(). Is there any way, to iterate through the desired root nodes only?

input.xml:

   <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<fileVersionListWrapper FileName="src.h">
    <VersionList versionNumber="1.80">
        <Reviewed>
            <commentId>v1.80(c5)</commentId>
            <author>Robin</author>
            <lines>47</lines>
            <lines>48</lines>
            <lines>49</lines>
        </Reviewed>
        <Reviewed>
            <commentId>v1.80(c6)</commentId>
            <author>Sujan</author>
            <lines>82</lines>
            <lines>83</lines>
            <lines>84</lines>
            <lines>85</lines>
        </Reviewed>
    </VersionList>
<VersionList versionNumber="1.81">
        <Reviewed>
            <commentId>v1.81(c4)</commentId>
            <author>Robin</author>
            <lines>47</lines>
            <lines>48</lines>
            <lines>49</lines>
        </Reviewed>
        <Reviewed>
            <commentId>v1.81(c5)</commentId>
            <author>Sujan</author>
            <lines>82</lines>
            <lines>83</lines>
            <lines>84</lines>
            <lines>85</lines>
        </Reviewed>
    </VersionList>
</fileVersionListWrapper>
1

There are 1 answers

3
Kenneth Clark On BEST ANSWER

You can get the node from the xml using XPATH

XPath, the XML Path Language, is a query language for selecting nodes from an XML document. In addition, XPath may be used to compute values (e.g., strings, numbers, or Boolean values) from the content of an XML document. What is Xpath.

Your XPath expression will be

/fileVersionListWrapper/VersionList[@versionNumber='1.81']

meaning you want to only return VersionList where the attribute is 1.81

JAVA Code

I have made an assumption that you have the xml as string so you will need the following idea

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();    
InputSource inputSource = new InputSource(new StringReader(xml));
Document document = builder.parse(inputSource);
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile("/fileVersionListWrapper/VersionList[@versionNumber='1.81']");
NodeList nl = (NodeList) expr.evaluate(document, XPathConstants.NODESET);   

Now it will be simply loop through each node

for (int i = 0; i < nl.getLength(); i++)
{
  System.out.println(nl.item(i).getNodeName());
}

to get the nodes back to to xml you will have to create a new Document and append the nodes to it.

  Document newXmlDocument = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
  Element root = newXmlDocument.createElement("fileVersionListWrapper");
  for (int i = 0; i < nl.getLength(); i++)
  {
    Node node = nl.item(i);
    Node copyNode = newXmlDocument.importNode(node, true);
    root.appendChild(copyNode);
  }
  newXmlDocument.appendChild(root);

once you have the new document you will then run a serializer to get the xml.

DOMImplementationLS domImplementationLS = (DOMImplementationLS) document.getImplementation();
LSSerializer lsSerializer = domImplementationLS.createLSSerializer();
String string = lsSerializer.writeToString(document);

now that you have your String xml , I have made an assumption you already have a Jaxb object which looks similar to this

@XmlRootElement(name = "fileVersionListWrapper")
public class FileVersionListWrapper
{
  private ArrayList<VersionList> versionListArrayList = new ArrayList<VersionList>();

  public ArrayList<VersionList> getVersionListArrayList()
  {
    return versionListArrayList;
  }

  @XmlElement(name = "VersionList")
  public void setVersionListArrayList(ArrayList<VersionList> versionListArrayList)
  {
    this.versionListArrayList = versionListArrayList;
  }
}

Which you will simple use the Jaxb unmarshaller to create the objects for you

JAXBContext jaxbContext = JAXBContext.newInstance(FileVersionListWrapper .class);
Unmarshaller jaxbUnmarshaller = jaxbContext.createUnmarshaller();
StringReader reader = new StringReader(xmlString);
FileVersionListWrapper fileVersionListWrapper = (FileVersionListWrapper)  jaxbUnmarshaller.unmarshal(reader);