I need to grab some hyper references in a Bash script.
The following command uses curl
and xmllint
to read all href
attributes of a HTML page:
curl --silent -L google.com | xmllint --html --xpath '//a/@href' -
But I need only the values of the attributes. The value of an attribute can be selected with the string()
function. But if I use it, I get only the first element of the list of attributes:
curl --silent -L google.com | xmllint --html --xpath 'string(//a/@href)' -
How can I apply the the string()
function to each attribute?
You could do (notice the difference in the XPath expression):
curl --silent -L google.com | xmllint --html --xpath '//a/@*'
and then add another pipe to send the output to
sed
, filtering out the attribute names to get the values you want. But this is a sort of odd way to extract stuff from a document.