Get count of specific nodes between two specific sibling nodes

947 views Asked by At

I'm using HtmlAgilityPack to get a filtered DOM of <h2> and <h3> nodes and using Xpath 1.0 (from my Xpath 1.0 crash course this week) I need to get the number of <h3>'s (the number varies) that are between sibling <h2>'s as follows:

<div>

  <h2>heading 1</h2>

  <h3>sub 1.1</h3>
  <h3>sub 1.2</h3>

  <h2>heading 2</h2>

  <h3>sub 2.1</h3>

  <h2>heading 3</h2>
  ....

</div>

When I iterate (using C#) through the filtered nodes I want the exact number of <h3>'s that are after a <h2> and before the next <h2>. When I use the following I get all the <h3>'s as the result.

int countH3 = n.SelectNodes("./preceding-sibling::h2[2]/following-sibling::h2[3]/preceding-sibling::h3").Count(); //the [position] is set dynamically

For the node structure above would like the result of the code line to be:

countH3 = 1

but it is:

countH3 = 3

I've found many similar SO questions regarding "sibling nodes between sibling nodes" and have to thank @LarsH for his comment in another question that /preceding::h3 returns ALL <h3>'s which helped explain the issue. I think I may need to use the Kayessian method of node-set intersection but get the "invalid token" error when I include the . | union character as follows:

countH3 = n.SelectNodes("./h2[2]/following-sibling::h2[3]
   [count(.|./h2[2]/following-sibling::h2[3]/preceding-sibling::h3)=
   count(./h2[2]/following-sibling::h2[3]/preceding-sibling::h3)]").Count();

Any suggestions appreciated.

0

There are 0 answers