I'm trying to get actual value of given xpath. I am having the following code in sample.rb file
require 'rubygems'
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML(open('http://www.changebadtogood.com/'))
desc "Trying to get the value of given xapth"
task :sample do
begin
doc.xpath('//*[@id="view_more"]').each do |link|
puts link.content
end
rescue Exception => e
puts "error"
end
end
Output is:
View more issues ..
When I try to get the value for other a different XPath, such as:
/html/body/div[4]/div[3]/h1/span
then I get the "error" message.
I tried in this in Nokogiri. I don't know why this is giving result for few XPaths only.
I tried the same in Hpricot.
http://hpricot.com/demonstrations
I paste my url and XPaths and I see the result for
//*[@id="view_more"]
as
View more issues ..
[This text is present at bottom of recent issues header]
But it is not showing result for:
/html/body/div[4]/div[3]/h1/span
For this XPath I'm expecting the result Bad
.
[This was present in
http://www.changebadtogood.com/ as the first header of class="hero-unit" div.]
Your problem has to do with a poor XPath selector, and is unrelated to Nokogiri or Hpricot. Let's investigate:
From this we can see that there are only two divs that are children of the
<body>
element, and sodiv[4]
fails to select one.It appears that you're trying to select the span here:
Instead of relying on the fragile markup leading up to this (indexing anonymous hierarchies of element), use the semantic structure of the document to your advantage for a selector that is both simpler and more robust. Using either CSS or XPath syntax: