Dom4j vs JAXB for reading and updating large and complex XML files

1.8k views Asked by At

I have an XML file with a stable tree structure and more than 5000 elements.

A fraction of it is below:

<Companies>
    <Offices>
        <RevenueInfo>
            <TransactionId>14042015014606877</TransactionId>
            <Company>
                <Identification>
                    <GlobalId>25142400905</GlobalId>
                    <BranchId>373287734</BranchId>
                    <GeoId>874</GeoId>
                    <LastUpdated>2015-04-14T01:46:06.940</LastUpdated>
                    <RecordType>7785</RecordType>
                </Identification>
                <Info>
                    <DataEntry>
                        <EntryId>12345</EntryId>
                    </DataEntry>
                    <DataEntry>
                        <EntryId>34567</EntryId>
                    </DataEntry>
                    <DataEntry>
                        <EntryId>89076</EntryId>
                    </DataEntry>
                    <DataEntry>
                        <EntryId>13211</EntryId>
                    </DataEntry>
                </Info>

                ...more elements

            </Company>
        </RevenueInfo>
    </Offices>
</Companies>

I need to be able to update any of the values in the document based on user input and create a new XML file with the updated information. User will pass BranchId, the name of the element to update and it's number of order if multiple occurring element ( for example, for EntryId 12345 the user will pass 373287734 EntryId=1 010101 )

I've been looking at JAXB but it seems like a considerable effort to create the model classes for this kind of XML but it also seems like it would make printing to file and locating the element to update a lot easier.

Dom4j seems to have good performance results too, but not sure how parsing will be.

My question is, is JAXB the best approach in this case or can you suggest a better way to parse this type of XML?

2

There are 2 answers

1
forty-two On

Leaving performance and memory requirements aside, I would recommend trying XPath together with DOM4J (or JDOM, or even plain DOM). To select the company you could use an XPath expression like this:

"//Company[Identification/BranchId = '373287734']"

Then, using the returned company element as context, you can get the element to be updated with another XPath expression:

"//EntryId[position() = 1]"
0
Michael Kay On

In my experience JAXB only works well when the schema is simple and stable. In other cases you are better off using a generic tree model. The main generic models in the Java world are DOM, JDOM2, DOM4J, XOM, AXIOM. My own preferences are JDOM2 and XOM; DOM4J seems to me overcomplex, and somewhat old-fashioned. But it depends what you are looking for.

But then, the application you describe looks an ideal candidate for an "XML end-to-end" or XRX approach - XForms, XSLT, XQuery, XProc. You don't need Java at all.