Javascript DOMParser and XMLSerialier removes XML entities

845 views Asked by At

I am trying to preserve some XML entities when parsing XML files in javascript. The following code snippet illustrates the problem. Is there a way for me to make round-trip parse and retain the XML entities (  is nbsp; html)? This happens in Chrome FF and IE10.

var aaa='<root><div>&#160;one&#160;two</div></root>'
var doc=new DOMParser().parseFromString(aaa,'application/xml')
new XMLSerializer().serializeToString(doc)
"<root><div> one two</div></root>"

The issue is I am taking some chunks out of html and storing them in xml, and then I want to get the spaces back in XML when I'm done. Edit: As Dan and others have pointed out, the parser replaces it with the ascii code 160, which to my eyes looks like an ordinary space but:

var str1=new XMLSerializer().serializeToString(doc)
str1.charCodeAt(15)
160

So where ever my application is losing the spaces, it is not here.

1

There are 1 answers

2
dandavis On BEST ANSWER

You can use a ranged RegExp to turn the special chars back into xml representations. as a nice re-usable function:

function escapeExtended(s){
 return s.replace(/([\x80-\xff])/g, function (a, b) {
   var c = b.charCodeAt();
   return "&#" + b.charCodeAt()+";" 
 });
}


var aaa='<root><div>&#160;one&#160;two</div></root>'
var doc=new DOMParser().parseFromString(aaa,'application/xml')
var str= new XMLSerializer().serializeToString(doc);
alert(escapeExtended(str)); // shows: "<root><div>&#160;one&#160;two</div></root>"

Note that HTML entities (ex quot;) will lose their symbol name, and be converted to XML entities (the &#number; kind). you can't get back the names without a huge conversion table.