How to delete all comments from XML document with REXML+XPATH?

818 views Asked by At

I've got an XML file that has a ton of comments that make the file super large and muddy. Is it possible to delete the comments out of it with REXML?

I've tried this, but it isn't working (though, strangely enough, its not failing either):

doc.elements.each('//comment()') { |n| doc.delete n }

UPDATE

This works:

require 'rexml/document'

doc = REXML::Document.new "<root><foo><!-- comment --></foo></root>"

doc.elements('//*').each { |n| n.comments().each { |c| c.parent = nil } }

formatter = REXML::Formatters::Pretty.new(4)

formatter.compact = true

puts formatter.write(doc.root, '')

# Output:  
#
# <root>
#    <foo/>
# </root>

I got the solution from here (ruby-doc.org).

2

There are 2 answers

2
Martin Honnen On

Try

def del_comments(node)
  node.comments().each { |comment| node.delete comment }
  node.elements().each { |child| del_comments(child) }
end

del_comments(doc)

A complete snippet is

require "rexml/document"
include REXML  # so that we don't have to prefix everything with REXML::...
string = <<EOF
<!-- comment 1 -->
  <mydoc>
    <someelement attribute="nanoo">Text, text, text</someelement>
    <!-- comment 2 -->
    <foo>
      <!-- comment 3 -->
      <bar>whatever</bar>
      <!-- comment 4 -->
    </foo>
    <!-- comment 5 -->
    <baz>...</baz>
    <!-- comment 6 -->
  </mydoc>
<!-- comment 7 -->
EOF

doc = Document.new string

def del_comments(node)
  node.comments().each { |comment| node.delete comment }
  node.elements().each { |child| del_comments(child) }
end

del_comments(doc)

puts doc

which outputs

  <mydoc>
    <someelement attribute='nanoo'>Text, text, text</someelement>

    <foo>

      <bar>whatever</bar>

    </foo>

    <baz>...</baz>

  </mydoc>

so all comments are removed.

0
nash On
REXML::XPath.match(doc, '//comment()').each(&:remove)

REXML::XPath is a class containing methods for searching nodes in a document. The match method will return an array of nodes. First argument is a node, from where the search must be start. Second argument is the xpath used to search.

It returns an array containing all elements found, on which you then run the remove method. The above expression removes all comments from the document.

Link to the REXML::XPath documentation