XPath 1.0 - Make selection based on value of text spread over multiple nodes

443 views Asked by At
<root>
  <div>
    <p>this text</p>
    <p><span>fo</span><span>ob</span><span>ar</span></p>
  </div>

  <div>
    <p>this text</p>
    <p><span>fo</span><span>b</span><span>ar</span></p>
  </div>

  <div>
    <p>this text</p>
    <p><span>fooba</span><span>r</span></p>
  </div>

  <div>
    <p><span>foo</span>this text<span>bar</span></p>
  </div>

  <div>
    <p><span>foo</span><img/><span>bar</span></p>
  </div>

  <div>
    <p><span>foo</span><span>bar</span><span>baz</span></p>
  </div>

  <div>
    <p>foobar</p>
  </div>
</root>

Given the above XML what XPath 1.0 query would select the <div>s based on foobar appearing within a single <span> or split across multiple consecutive <span>s?

  • I only want to select the first and third <div>.
  • The second <div> contains fobar, not foobar.
  • In the fourth <div> the <span>s are not consecutive.
  • The fifth <div> has an <img> between the <span>s so they're no longer consecutive.
  • The text of the sixth is foobarbaz, not foobar.
  • The seventh has the correct text but not within <span>s.

I have tried using concat() but that doesn't work because I need to know the number of arguments first. Also, saying concat(//*, //*) is equivalent to concat(//*[1], //*[1]), which is not what I want.

This is within PHP so I only have XPath 1.0.

2

There are 2 answers

6
har07 On

You can try this XPath :

/root/div[contains(normalize-space(.), 'foobar')]

Notice that . returns concatenation of all text nodes within current context node.

output in xpath tester :

Element='<div>
  <p>this text</p>
  <p>
    <span>fo</span>
    <span>ob</span>
    <span>ar</span>
  </p>
</div>'
Element='<div>
  <p>this text</p>
  <p>
    <span>fooba</span>
    <span>r</span>
  </p>
</div>'
0
DSHCS On

I had a document with paragraphs (<p>) who’s string value (.) contained a prefix (question:). I needed to strip off the prefix and all ancestor elements, but retain the paragraph (<p>) and any elements following the prefix. The prefix could have been distributed across more than one element at different depths in the XML. This solution was restricted to XSLT 1.0. I found that by recursing across descendant::text() and keeping track of the sum of the text node string lengths I could determine when I was at the text node that contained the end of the prefix. Note the apply template selection that selects only paragraphs that start with the prefix thus allowing the use of only the sum of the text node lengths to detect where to stop. You could accumulate the actual string also and use a different test (contains) to determine when to stop.

Sample XML (excuse the complexity, needed for testing)

<?xml version="1.0" encoding="utf-8" ?>
<root>
    <p><d1><d2>q<a>u<b>e<c>s</c><d>t</d>i</b><e>o</e></a><f>n</f>:</d2></d1> text</p>
</root>

Sample XSL (note <trace> used to document function)

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">

  <xsl:output method="xml" indent="yes"/>

  <xsl:template match="/root/p[substring(.,1,9)='question:']">
      <trace info="{concat('ML descendant::text()[1]:',' name=',name(),', .=',.)}"/>
      <xsl:apply-templates select="descendant::text()[1]" mode="m1"/>
  </xsl:template>

  <xsl:template mode="m1" match="text()">
      <xsl:param name="length" select="0"/>
      <xsl:variable name="temp" select="$length+string-length()"/>

      <trace info="{concat('m1:',' name=',name(),', length=',$temp,', .=',.)}"/>
      <xsl:choose>
          <xsl:when test="$temp&lt;9">
            <xsl:apply-templates select="following::text()[1]" mode="m1">
                <xsl:with-param name="length" select="$temp"/>
            </xsl:apply-templates>
          </xsl:when>
          <xsl:otherwise>              
            <trace info="m1: prefix match"/>
          </xsl:otherwise>
      </xsl:choose>
  </xsl:template>

</xsl:stylesheet>

Output

<?xml version="1.0" encoding="UTF-8"?>
    <trace info="ML descendant::text()[1]: name=p, .=question: text"/>
<trace info="m1: name=, length=1, .=q"/>
<trace info="m1: name=, length=2, .=u"/>
<trace info="m1: name=, length=3, .=e"/>
<trace info="m1: name=, length=4, .=s"/>
<trace info="m1: name=, length=5, .=t"/>
<trace info="m1: name=, length=6, .=i"/>
<trace info="m1: name=, length=7, .=o"/>
<trace info="m1: name=, length=8, .=n"/>
<trace info="m1: name=, length=9, .=:"/>
<trace info="m1: prefix match"/>