Hpricot: How to extract inner text without other html subelements

Question

Hpricot: How to extract inner text without other html subelements

1.2k views Asked by Yan Pritzker At 22 January 2012 at 22:53

I'm working on a vim rspec plugin (https://github.com/skwp/vim-rspec) - and I am parsing some html from rspec. It looks like this:

doc = %{
<dl>
  <dt id="example_group_1">This is the heading text</dt>
  Some puts output here
 </dl>
}

I can get the entire inner of the using:

(Hpricot.parse(doc)/:dl).first.inner_html

I can get just the dt by using

(Hpricot.parse(doc)/:dl).first/:dt

But how can I access the "Some puts output here" area? If I use inner_html, there is way too much other junk to parse through. I've looked through hpricot docs but don't see an easy way to get essentially the inner text of an html element, disregarding its html children.

Original Q&A

There are 2 answers

**Phrogz** · Answer 1 · 2012-01-23T02:19:48+00:00

Note that this is bad HTML you have there. If you have control over it, you should wrap the content you want in a <dd>.
In XML terms what you are looking for is the TextNode following the <dt> element. In my comment I showed how you can select this node using XPath in Nokogiri.
However, if you must use Hpricot, and cannot select text nodes using it, then you could hack this by getting the inner_html and then stripping out the unwanted:
```
(Hpricot.parse(doc)/:dl).first.inner_html.sub %r{<dt>.+?</dt>}, ''
```

**Yan Pritzker** · Answer 2 · 2012-01-24T04:10:01+00:00

I ended up figuring out a route by myself, by manually parsing the children:

(@context/"dl").each do |dl|
  dl.children.each do |child|
    if child.is_a?(Hpricot::Elem) && child.name == 'dd'
      # do stuff with the element
    elsif child.is_a?(Hpricot::Text)
      text=child.to_s.strip
      puts text unless text.empty?
    end
  end

TechQA.

Hpricot: How to extract inner text without other html subelements

There are 2 answers

Related Questions in RUBY

Related Questions in PARSING

Related Questions in VIM

Related Questions in RSPEC

Related Questions in HPRICOT

Popular Questions

Trending Questions