xmllint and xpath to parse xml data from https://mail.google.com/mail/feed/atom

763 views Asked by At

I am getting some xml data from my gmail account that I would like to parse. Ths xml data looks like:

<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://purl.org/atom/ns#" version="0.3">
  <title>Gmail - Inbox for @gmail.com</title>
  <tagline>New messages in your Gmail Inbox</tagline>
  <fullcount>54</fullcount>
  <link rel="alternate" href="http://mail.google.com/mail" type="text/html"/>
  <modified>2014-11-25T04:40:04Z</modified>
  <entry>
    <title>test</title>
    <summary/>
    ...
</feed>

and I was hopping to get all the titles of all the entry with something like:

xmllint --xpath '//feed/entry/title' myfile.xml

Now, I found out that this would work if there was not this xmlns info. But with the xmlns info, I get the message

XPath set is empty

I would like a simple oneliner to parse this file, without having to modify the file (removing the xmlns section).

--> EDIT: Thanks to @Mathias, the proper onliner looks like: echo "setns x=http://purl.org/atom/ns#\nxpath /x:feed/x:entry/x:title/text()"

2

There are 2 answers

3
Mathias Müller On BEST ANSWER

You are probably aware that your input XML is in a default namespace. Your original XPath expression:

xmllint --xpath '//feed/entry/title' myfile.xml

will never succeed to find elements that are in a namespace. That's why the XPath result set is empty.

If you're absolutely unwilling to register or declare a namespace, the following expression works:

xmllint --xpath "//*[name() = 'feed']/*[name() = 'entry']/*[name() = 'title']" myfile.xml

If your input XML contained prefixed namespaces, you'd have to use local-name() instead of name().


An alternative that is not a "simple oneliner" is to use xmllint in shell mode, register a namespace together with a prefix and use it in the XPath expression. See this answer for details. That's the proper way of addressing the problem.

3
Akhil Thayyil On

Try debugging the same in the shell for xmllint :

xmllint --shell filename

xpath '//feed/entry/'

Debug like the above, traversing into the nodes level by level, so that you will come to know where it is breaking