I would like to use Hpricot to scan the inner_text of all elements, and know what element is currently being scanned. However, each approach I have taken leads to a recursion. Is there a built-in function to do this with Hpricot (or Nokogiri)? The code below just scans one level down:
@t = []
doc = Hpricot(open("some html doc"))
(doc/"html").each do |e|
e.children.each do |child|
if child.is_a?(Hpricot::Text)
@t << child.to_s.strip
end
end
end
Although I'm not sure exactly why you want to collect all text nodes (perhaps there is a more efficient solution), this should get you started:
It uses Nokogiri's
traversewhich will visit all nodes under your starting node.