Hpricot is parsing an xml file incorrectly: misreads empty tag

24 views Asked by At

I have an xml file whose contents are thus:

<data>
  <title/>
  <creator/>
  <copyright>Foo</copyright>
</data>

There's no doctype, which might be related to the problem. I'm reading it in with the Hpricot gem (0.8.1), in rails 2.2.2. When it parses it, it doesn't treat the title tag as empty, and instead wraps it around its sibling tags:

doc = Hpricot(File.read("#{ENV['HOME']}/test.xml")) 
=> #<Hpricot::Doc {elem <data> "\n  " {elem <title> "\n  " {emptyelem <creator>} "\n  " {elem <copyright> "Foo" </copyright>} "\n"} </data>} "\n">

puts doc.to_s
<data>
  <title>
  <creator />
  <copyright>Foo</copyright>
</title></data>

Can anyone explain what's going wrong? I don't see why the title tag would be treated differently to the creator tag: if there is an issue with not having doctype for example then i'd think these two tags would both suffer from it (or neither of them).

0

There are 0 answers