I need to grab some hyper references in a Bash script.
The following command uses curl and xmllint to read all href attributes of a HTML page:
curl --silent -L google.com | xmllint --html --xpath '//a/@href' -
But I need only the values of the attributes. The value of an attribute can be selected with the string() function. But if I use it, I get only the first element of the list of attributes:
curl --silent -L google.com | xmllint --html --xpath 'string(//a/@href)' -
How can I apply the the string() function to each attribute?
You could do (notice the difference in the XPath expression):
curl --silent -L google.com | xmllint --html --xpath '//a/@*'and then add another pipe to send the output to
sed, filtering out the attribute names to get the values you want. But this is a sort of odd way to extract stuff from a document.