xpath: extract the trailing text of a node

Question

xpath: extract the trailing text of a node

184 views Asked by user1424739 At 30 September 2020 at 21:57

I have an html file of the following content.

...
<table><tbody>
...
            <tr>
              <td><span class="myclass">C</span>
                <a href="/myurl" title="myclick">mytext</a>
                   tailing text
              </td>
            </tr>
...
</tbody></table>
...

I would like to extract the info and write to a TSV file in the following format.

C<TAB>mytext<T>tailing text

So far, I can only figure this xpath code to extract the first two columns. Could anybody show me how to extract the 3rd column? Thanks.

xidel -s -e '//table/tbody/tr/td/join((span, a), x:cps(9))' - < infile.html

Original Q&A

There are 2 answers

**Martin Honnen** · Answer 1 · 2020-09-30T22:24:57+00:00

If you use //table/tbody/tr/td/string-join(node()[normalize-space()], x:cps(9)) you get three columns but the last might contain whitespace before and after the text so perhaps //table/tbody/tr/td/string-join(node()[normalize-space()]/normalize-space(), x:cps(9)) is ensuring you don't get whitespace you haven't shown in your desired result.

**zx485** · Answer 2 · 2020-09-30T22:21:24+00:00

You can use this command:

xidel infile.html --xpath '//table/tbody/tr/td/string-join((span, "<TAB>", a, "<T>", a/following::text()[1]))'

or

xidel --xpath '//table/tbody/tr/td/string-join((span, "<TAB>", a, "<T>", a/following::text()[1]))' - < infile.html

Another approach is

xidel infile.html --xpath '//table/tbody/tr/td/concat(span, "<TAB>", a, "<T>", a/following-sibling::text()[1])'

The output is - in all three cases:

C<TAB>mytext<T>tailing text

TechQA.

xpath: extract the trailing text of a node

There are 2 answers

Related Questions in XPATH

Related Questions in XQUERY

Related Questions in XIDEL

Popular Questions

Popular Tags

Trending Questions