xmllint / Xpath extract parent node where child contains text from google shopping feed

1k views Asked by At

I am trying to extract all "item" nodes containing a g:custom_label_0 with the text value "2020-2021" So far, I manage to find all nodes containing the child g:custom_label_0, but I don't manage to filter by the text value of the field.

Here is the example XML:

   <item>
        <description>[...]</description>
        <g:availability>in stock</g:availability>
        <g:brand>Barts</g:brand>
        <g:condition>new</g:condition>
        <g:custom_label_0>2020-2021</g:custom_label_0>
        <g:id>108873/10-3</g:id>
        <g:image_link>[...]</g:image_link>
        <g:price>26.99 EUR</g:price>
        <g:sale_price>26.99 EUR</g:sale_price>
        <g:shipping>
            <g:country>NL</g:country>
            <g:price>4.50 EUR</g:price>
        </g:shipping>
        <g:shipping_weight>7.95</g:shipping_weight>
        <link>[....]</link>
    </item>
   ...

There is nodes containing other values than 2020-2021, but I want to extract all complete item nodes containing this text. Here's what I made in order to extract all nodes having the field available.

xmllint --xpath '//item["g:custom_label_0"]' myfile.xml 

i tried adding a text filter via square brackets etc. but I have the feeling the quotation around the custom_label_0 might cause trouble. Adding more filters within the quotes gets accepted (no error), but I won't be able to add more quotations inside to filter the string.

Does work, throws no error:

xmllint --xpath '//item["g:custom_label_0[text()]"]' myfile.xml 

If I wanted to filter the text now, I need to use quotations again. Escaping them breaks the code. How can i further filter down the text "2020-2021" when both types of quotation marks are already used?

1

There are 1 answers

0
Daniel Haley On BEST ANSWER

You're right; the quotes around g:custom_label_0 is causing trouble. That makes it a string and that is always true so it will return all item elements.

The g: is a namespace prefix. To bind a namespace to a prefix in xmllint, you have to use it in shell mode (see https://stackoverflow.com/a/8266075/317052 for an example).

An alternative is to test the element name to select the g:custom_label_0 element and then test the value of that element to see if it's 2020-2021.

Example...

xmllint --xpath '//item[*[name()="g:custom_label_0"][.="2020-2021"]]' myfile.xml