XSL unescape HTML inside CDATA

4.9k views Asked by At

I'm trying to transform XML:

 <catalog>
            <country><![CDATA[ WIN8 &lt;b&gt;X&lt;/b&gt; Mac OS ]]></country>
    </catalog>

into

<catalog>
        <country><![CDATA[  WIN8 <b>X</b> Mac OS ]]></country>        
</catalog>

with an XSL transform.

I know that using disable-output-escaping="yes" or cdata-section-elements I could transform escaped characters into unescaped and put inside CDATA, but this does not work if charaters are already inside CDATA.

Is there a simple way for this? Thanks.

2

There are 2 answers

3
Tomalak On BEST ANSWER

This

<catalog>
  <country><![CDATA[  WIN8 <b>X</b> Mac OS ]]></country>        
</catalog>

is equivalent to

<catalog>
  <country> WIN8 &lt;b&gt;X&lt;/b&gt; Mac OS </country>
</catalog>

Which is exactly what you get when using

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output omit-xml-declaration="yes" indent="yes" />

  <xsl:template match="node() | @*">
    <xsl:copy>
      <xsl:apply-templates select="node() | @*" />
    </xsl:copy>
  </xsl:template>

  <xsl:template match="country/text()">
    <xsl:value-of select="." disable-output-escaping="yes" />
  </xsl:template>
</xsl:stylesheet>

The point is that disable-output-escaping (DOE) has no effect in an element that falls into cdata-section-elements (CSE). That's because both directives disable output escaping.

The text value " WIN8 <b>X</b> Mac OS " becomes:

  • when serialized normally: WIN8 &lt;b&gt;X&lt;/b&gt; Mac OS

  • when serialized with CSE: <![CDATA[ WIN8 <b>X</b> Mac OS ]]>

  • when serialized with DOE: WIN8 <b>X</b> Mac OS

Note how the last two renderings are exactly the same, except for the enclosing <![CDATA[ ... ]]>.

CDATA disables output escaping for text node children of an element and in exchange encloses them in <![CDATA[ ... ]]> markers to make up for the lost level of escaping.

If you additionally set DOE on an <xsl:value-of> that outputs a text into an element that has CSE set, nothing happens. Output escaping already is disabled.

Therefore this

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output omit-xml-declaration="yes" indent="yes" />
  <xsl:output cdata-section-elements="country" />

  <xsl:template match="node() | @*">
    <xsl:copy>
      <xsl:apply-templates select="node() | @*" />
    </xsl:copy>
  </xsl:template>

  <xsl:template match="country/text()">
    <xsl:value-of select="." disable-output-escaping="yes" />
  </xsl:template>
</xsl:stylesheet>

will give you exactly what your input was.

That's why you cannot get rid of double escaping and have CDATA during the same transformation. You could use a two-step approach (1st step disables output escaping, 2nd step adds back CDATA) if you positively must have CDATA in the result document — but personally I think it's not worth it.

0
Matteo Conta On

This is another solution, use CDATA inside an xsl:text with disable-output-escaping="yes":

<xsl:template match="/" >
    <xsl:text disable-output-escaping="yes"><![CDATA[
    <script>
    var thisTextIsNotEscaped = "<b>this text is normally escaped, but not in this case</b>";
    </script>
    ]]>
</xsl:text>
</xsl:template>