I am trying to extract all "item" nodes containing a g:custom_label_0 with the text value "2020-2021" So far, I manage to find all nodes containing the child g:custom_label_0, but I don't manage to filter by the text value of the field.
Here is the example XML:
<item>
<description>[...]</description>
<g:availability>in stock</g:availability>
<g:brand>Barts</g:brand>
<g:condition>new</g:condition>
<g:custom_label_0>2020-2021</g:custom_label_0>
<g:id>108873/10-3</g:id>
<g:image_link>[...]</g:image_link>
<g:price>26.99 EUR</g:price>
<g:sale_price>26.99 EUR</g:sale_price>
<g:shipping>
<g:country>NL</g:country>
<g:price>4.50 EUR</g:price>
</g:shipping>
<g:shipping_weight>7.95</g:shipping_weight>
<link>[....]</link>
</item>
...
There is nodes containing other values than 2020-2021, but I want to extract all complete item nodes containing this text. Here's what I made in order to extract all nodes having the field available.
xmllint --xpath '//item["g:custom_label_0"]' myfile.xml
i tried adding a text filter via square brackets etc. but I have the feeling the quotation around the custom_label_0 might cause trouble. Adding more filters within the quotes gets accepted (no error), but I won't be able to add more quotations inside to filter the string.
Does work, throws no error:
xmllint --xpath '//item["g:custom_label_0[text()]"]' myfile.xml
If I wanted to filter the text now, I need to use quotations again. Escaping them breaks the code. How can i further filter down the text "2020-2021" when both types of quotation marks are already used?
You're right; the quotes around
g:custom_label_0
is causing trouble. That makes it a string and that is always true so it will return allitem
elements.The
g:
is a namespace prefix. To bind a namespace to a prefix in xmllint, you have to use it in shell mode (see https://stackoverflow.com/a/8266075/317052 for an example).An alternative is to test the element name to select the
g:custom_label_0
element and then test the value of that element to see if it's2020-2021
.Example...