XPath for first word?

Question

658 views Asked by Rares Rares-Liviu At 26 April 2021 at 14:10

For this HTML / XML:

<div class="contentBlock">
  <h2> </h2>
  <h1></h1>
  <h1>DBS055 - single  module packages</h1>
</div>

I want to extract with XPath only DBS055, not the entire text.

There are 1 answers

**kjhughes** · Accepted Answer · 2021-04-26T14:35:47+00:00

//h1[normalize-space()]/replace(normalize-space(),'^([\w\-]+).*', '$1')

will return all of the first words of the string values of those h1 elements that have a non-space character in their string value.

substring-before(
  concat(
    normalize-space(
      translate(//h1[normalize-space()][1], ',;/.', ' ')), ' '), ' ')

approximates the more robust XPath 2.0 solution. Expand ',;/.' as necessary for various characters you consider to define word boundaries.

Explanation: