Combining/ Merging similar XML structure within one file

798 views Asked by At

I have the following structure in an XML-file:

<root name="name1">
  <layer1 name="name2">
    <layer2 attribute="sowhat">
    </layer2>
  </layer1>
</root>
<root name="name1">
  <layer1 name="name2">
    <layer2 attribute="justit">
    </layer2>
  </layer1>
</root>
<root name="name1">
  <layer1 name="name2">
    <layer2 attribute="yeaha">
    </layer2>
  </layer1>
</root>
<root name="name2123">
  <layer1 name="name2">
    <layer2 attribute="itis">
    </layer2>
  </layer1>
</root>

And I want to get a result that looks like:

<root name="name1">
  <layer1 name="name2">
    <layer2 attribute="sowhat"></layer2>
    <layer2 attribute="justit"></layer2>
    <layer2 attribute="yeaha"></layer2>
  </layer1>
</root>
<root name="name2123">
  <layer1 name="name2">
    <layer2 attribute="itis">
    </layer2>
  </layer1>
</root>

So I want to merge and combine nodes as far as possible. I havent uses XSLT yet, tried it, but I dont get it, not even the general idea. Any other ideas or tools?

Thanks

1

There are 1 answers

0
Tomalak On

For what it's worth, here is a way to do this in XSLT 1.0.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes" />
  <xsl:strip-space elements="*" />

  <xsl:key name="name" match="*[@name]" use="
    concat(@name, '|', ancestor::*[1]/@name, '|', ancestor::*[2]/@name)
  " />

  <xsl:template match="node() | @*">
    <xsl:copy>
      <xsl:apply-templates select="node() | @*" />
    </xsl:copy>
  </xsl:template>

  <xsl:template match="*[@name]">
    <xsl:variable name="myKey" select="
      concat(@name, '|', ancestor::*[1]/@name, '|', ancestor::*[2]/@name)
    " />
    <xsl:variable name="myGroup" select="key('name', $myKey)" />

    <xsl:if test="generate-id() = generate-id($myGroup[1])">
      <xsl:copy>
        <xsl:copy-of select="@*" />
        <xsl:apply-templates select="$myGroup/*" />
      </xsl:copy>
    </xsl:if>
  </xsl:template>
</xsl:stylesheet>

outputs

<roots>
  <root name="name1">
    <layer1 name="name2">
      <layer2 attribute="sowhat"/>
      <layer2 attribute="justit"/>
      <layer2 attribute="yeaha"/>
    </layer1>
  </root>
  <root name="name2123">
    <layer1 name="name2">
      <layer2 attribute="itis"/>
    </layer1>
  </root>
</roots>

The key feature of XSLT is the ability to express complex transformations in relatively few lines of code. The above transformation is 29 lines of code and you could squeeze it even more.

I think a crash course in XSLT goes beyond the scope of this answer. Besides that, there are countless crash courses in XSLT available all over the Internet.

So what I do is I'll give a general overview of what happens here.

First off, I've defined two classes of elements for your input - those that are merge-able and those that are not. I've defined all elements that have a @name attribute to be merge-able.

  1. All normal nodes (those without a @name) are copied as they are. The first <xsl:template> does that (it's the identity template).
  2. I've defined a "merge-able group" of elements as those that share a common set of @name attribute values along their ancestors.
    • To do that I create the concatenation of all relevant @name attributes for all elements that have them.
    • For the time being, this transformation can handle groups that go 3 levels deep (concat(@name, '|', ancestor::*[1]/@name, '|', ancestor::*[2]/@name)).
    • Add more levels in the same fashion if necessary.
    • The group name (the key) for the parent of sowhat is name2|name1||, this applies for the other <layer2> in that logical group.
  3. Now whenever the XSLT engine encounters an element with a @name, it
    • calculates the key for that element ($myKey).
    • gets the group of elements that have the same key ($myGroup).
    • finds out if the current element is the first element in the group, if so it copies it to the output
    • effectively this groups elements by their key (this technique is called Muenchian grouping).
    • then it takes a recursive step: it starts processing the children of that group ($myGroup/*).
    • effectively this takes us back to square 0 and the algorithm starts from the beginning.

There are some assumptions/limitations in my code that might not necessarily align with your input.

  • The elements ought to be merged by their @name and not by some other property.
  • The elements with the same @name ancestry do not have special attributes, so throwing away every element but the first one in a certain group will not cause loss of data.
  • There is a finite nesting depth.
  • Mergeable elements are never the descendants of non-mergeable elements (no <layer> with a @name inside a <layer> without a @name)
  • Probably others that slip my mind right now.

Reading recommendations

  • template matching and the general working mechanisms of an XSLT processor
  • XSL default rules
  • XPath
  • XSL keys and Muenchian grouping
  • the identity template
  • the concept of the current node throughout the processing flow