I have an html file of the following content.
...
<table><tbody>
...
<tr>
<td><span class="myclass">C</span>
<a href="/myurl" title="myclick">mytext</a>
tailing text
</td>
</tr>
...
</tbody></table>
...
I would like to extract the info and write to a TSV file in the following format.
C<TAB>mytext<T>tailing text
So far, I can only figure this xpath code to extract the first two columns. Could anybody show me how to extract the 3rd column? Thanks.
xidel -s -e '//table/tbody/tr/td/join((span, a), x:cps(9))' - < infile.html
If you use
//table/tbody/tr/td/string-join(node()[normalize-space()], x:cps(9))
you get three columns but the last might contain whitespace before and after the text so perhaps//table/tbody/tr/td/string-join(node()[normalize-space()]/normalize-space(), x:cps(9))
is ensuring you don't get whitespace you haven't shown in your desired result.