Preserving DOCTYPE declaration with REXML

282 views Asked by At

I'm trying to parse a log4j.xml file, edit some attributes, and write it back.

The log4j.xml has the <!DOCTYPE log4j:configuration SYSTEM "log4j.dtd"> declaration, but when I write it back, the declaration changed to <!DOCTYPE log4j>.

I've opened the file for parsing with xmlDoc = Document.new(File.new(file, 'r')), and wrote with xmlDoc.write(File.new(file, 'w'), 0).

I've also tried opening with xmlDoc = Document.new(File.new(file, 'r'), { :raw => :all }).

Is there a way to preserve the original DOCTYPE declaration?

Thank you very much!

1

There are 1 answers

0
WarHog On BEST ANSWER

I'm afraid that isn't possible with rexml usage. Look at that brief summary - this is the 'light version' of the process that take place in rexml library

require 'rexml/source'

LETTER = '[:alpha:]'
COMBININGCHAR = ''
EXTENDER = ''
NCNAME_STR= "[#{LETTER}_:][-[:alnum:]._:#{COMBININGCHAR}#{EXTENDER}]*"

IDENTITY = /^([!\*\w\-]+)(\s+#{NCNAME_STR})?(\s+["'](.*?)['"])?(\s+['"](.*?)["'])?/u
DOCTYPE_PATTERN = /\s*<!DOCTYPE\s+(.*?)(\[|>)/um

string = <<HERE
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd" >
<log4j:configuration>
</log4j:configuration>
HERE
source = REXML::SourceFactory.create_from(string)
md = source.match( DOCTYPE_PATTERN, true )
identity = md[1]
close = md[2]
identity =~ IDENTITY
name = $1
pub_sys = $2.nil? ? nil : $2.strip
long_name = $4.nil? ? nil : $4.strip
uri = $6.nil? ? nil : $6.strip
args = [ :start_doctype, name, pub_sys, long_name, uri ]
p args  # => [:start_doctype, "log4j", nil, nil, nil]

As you can see this snippet returns the same result as your code in the question. And beside this you see that there are no parameters in the snippet that can change this behavior.

As a workaround I can suggest you to use Nokogiri library. At the quick look it can parse such doctype properly:

require 'nokogiri'

string = <<HERE
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd" >
<log4j:configuration>
</log4j:configuration>
HERE

doc = Nokogiri::XML(string)
puts doc.internal_subset.to_s
# => <!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">