Using XSLT to merge/concatenate adjacent tags that share an attribute value, without losing child tags

63 views Asked by At

I have XHTML content in which adjacent anchor tags with the same href value are present and need to be merged. I've been trying to figure out a way to do this with an XSL template and have run into a wall.

I've set up my initial template based on a solution described in "Concatenating the content of tags that share an attribute value with XSLT." This almost gets me the result I need; however, the content of the adjacent anchor tags in my XHTML often includes child tags (e.g., <em></em>, <code></code>) among the text nodes, and these must be retained. They're being stripped out by my current template.

I realize in my current solution, the XPATH expressions are almost definitely overly complex, and to be honest, I don't fully understand how my template is producing the result that it is. I'm entirely open to other, simpler or more sensible approaches that will help me achieve the result I'm looking for. For background, I'm using the template in a node.js app with the saxon-js package, which I think supports XSLT 2.0 in addition to 1.0.

Here's a representative sample of my XHTML content:

<section xmlns="http://www.w3.org/1999/xhtml">
  <h1>Page heading</h1>
<p>I'm baby irony brunch <a href="http://www.url1.com">prism,</a><a href="http://www.url1.com"> </a><a href="http://www.url1.com"><em>farm-to-table</em> blog</a> vegan before they sold out. Viral cliche occupy neutral milk hotel prism drinking vinegar forage farm-to-table ennui tumblr.</p>
  
<p>Biodiesel jawn locavore irony <a href="http://www.url2.com">neutral milk</a><a href="http://www.url3.com"> hotel.</a> Ethical drinking vinegar gastropub pinterest taxidermy messenger bag next level. Plaid hot chicken enamel pin 8-bit vaporware scenester migas celiac direct trade twee fit DIY gorpcore tofu tousled.</p>

<p>Vape same small batch, fixie mukbang JOMO <a href="http://www.url4.com">gochujang</a> solarpunk single-origin <a href="http://www.url4.com">coffee</a> 3 wolf moon stumptown freegan. Truffaut selvage copper mug portland.</p> 

<p>Viral yuccie drinking vinegar artisan. <a href="http://www.url5.com">Snackwave</a><a href="http://www.url5.com"> </a><a href="http://www.url5.com">taxidermy</a> cloud bread knausgaard artisan. Mustache YOLO unicorn poutine, leggings craft beer cold-pressed hexagon.</p>
</section>

Here's the template I'm using:

<xsl:stylesheet xmlns:xhtml="http://www.w3.org/1999/xhtml" 
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
                exclude-result-prefixes="xhtml"
                version="1.0">

<xsl:output method="xml" omit-xml-declaration="yes"/>

<xsl:template match="*|@*">
   <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
   </xsl:copy>
</xsl:template>

<xsl:template match="//xhtml:a[@href=following-sibling::xhtml:a/@href and following-sibling::node()[1][not(self::text())]]|//xhtml:a[@href=preceding-sibling::xhtml:a/@href and preceding-sibling::node()[1][not(self::text())]]">
   <xsl:variable name="this-href" select="@href"/> 
   <xsl:if test="not(preceding-sibling::*[1][self::xhtml:a[@href = $this-href]])">
      <xsl:copy>
         <xsl:attribute name="href">
            <xsl:value-of select="$this-href"/>
         </xsl:attribute>
         <xsl:apply-templates select="self::*" mode="concatenate" />
      </xsl:copy>
   </xsl:if>
</xsl:template>

<xsl:template match="//xhtml:a[@href=following-sibling::xhtml:a/@href and following-sibling::node()[1][not(self::text())]]|//xhtml:a[@href=preceding-sibling::xhtml:a/@href and preceding-sibling::node()[1][not(self::text())]]" mode="concatenate">
  <xsl:variable name="this-href" select="@href"/> 
  <xsl:value-of select="." />
  <xsl:apply-templates select="following-sibling::*[1][self::xhtml:a[@href = $this-href]]|following-sibling::*[1][self::xhtml:a[@href = $this-href]]" mode="concatenate" />
</xsl:template>
</xsl:stylesheet>

Here is the output:

<section xmlns="http://www.w3.org/1999/xhtml">
  <h1>Page heading</h1>
<p>I'm baby irony brunch <a href="http://www.url1.com">prism, farm-to-table blog</a> vegan before they sold out. Viral cliche occupy neutral milk hotel prism drinking vinegar forage farm-to-table ennui tumblr.</p>
  
<p>Biodiesel jawn locavore irony <a href="http://www.url2.com">neutral milk</a><a href="http://www.url3.com"> hotel.</a> Ethical drinking vinegar gastropub pinterest taxidermy messenger bag next level. Plaid hot chicken enamel pin 8-bit vaporware scenester migas celiac direct trade twee fit DIY gorpcore tofu tousled.</p>

<p>Vape same small batch, fixie mukbang JOMO <a href="http://www.url4.com">gochujang</a> solarpunk single-origin <a href="http://www.url4.com">coffee</a> 3 wolf moon stumptown freegan. Truffaut selvage copper mug portland.</p> 

<p>Viral yuccie drinking vinegar artisan. <a href="http://www.url5.com">Snackwave taxidermy</a> cloud bread knausgaard artisan. Mustache YOLO unicorn poutine, leggings craft beer cold-pressed hexagon.</p>
</section>

Here's a fiddle showing the above.

This is very close to the result I need, except that I have to retain any child tags like <em></em> or <strong></strong> that may appear within the adjacent anchor tags with matching href values. Thank you for any help you can provide an XSLT newbie!

2

There are 2 answers

1
michael.hor257k On BEST ANSWER

It seems to me that something like this could work for you in a more elegant fashion:

XSLT 2.0

<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xpath-default-namespace="http://www.w3.org/1999/xhtml"
xmlns="http://www.w3.org/1999/xhtml">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="p">
    <xsl:copy>
        <xsl:for-each-group select="node()" group-adjacent="string(@href)">
            <xsl:choose>
                <xsl:when test="current-grouping-key()">
                    <a href="{@href}">
                        <xsl:apply-templates select="current-group()/node()"/> 
                    </a>
                </xsl:when>
                <xsl:otherwise>
                    <xsl:apply-templates select="current-group()"/> 
                </xsl:otherwise>  
            </xsl:choose>
        </xsl:for-each-group>
    </xsl:copy>                    
</xsl:template>

</xsl:stylesheet>
1
Michael Kay On

I suspect it's as simple as changing

<xsl:value-of select="." />

to

<xsl:copy-of select="child::node()"/>

in the mode="concatenate" template rule.

I can't help feeling that your following-sibling test conditions are far more complicated than they need to be, but perhaps that's because I haven't understood your requirements properly.