select an xml element, ignore element name, print newline

5k views Asked by At

I'd like to select the first element, but ignore its name in the output.

This is what I'm getting, after requesting the first url element from each input xml file:

% xmllint \
 --xpath '(//yandexsearch/response/results/grouping/group/doc/url)[1]' \
 *.response.ya.xml
<url>https://example.com/</url><url>https://example.net/</url><url>https://example.org/</url>

But this is what I want instead:

https://example.com/
https://example.net/
https://example.org/

Note that the idea is to select the value of the first <url> element from each input Yandex.XML (Я Feel Lucky).

How do I do that with xpath?

2

There are 2 answers

5
Patrice M. On

Try instead:

//yandexsearch/response/results/grouping/group/doc[1])/url/text()

XPath normally only selects nodes, and you would do concatenation in the code surrounding the xpath extraction.

That being said, XPath 2.0 can, if that's available to you:

string-join(//yandexsearch/response/results/grouping/group/doc[1])/url/text(), ' \n')

Also, this answer provides a couple of XSLT-based solutions.

0
cnst On

I ended up using awk to remove <url> and </url>, and print the text from each element on a separate line, ignoring all the empty lines:

xmllint \
--xpath '(//yandexsearch/response/results/grouping/group/doc/url)[1]' \
| awk -F'</?url>' '{for(i=2;i<=NF;i++) if ($i != "") print $i}'