How does the DOCTYPE declared as DTD influence XSLT transformation?

1.3k views Asked by At

I am transforming a TEI encoded text document (TEI - Text Encoding Initiative, a standard in text document encoding), using XSLT 1 and 2 and various processors.

I am experiencing a very peculiar problem. Depending on which DTD I supply in the header of the XML file I get different results. An example input file:

<!DOCTYPE TEI SYSTEM "tei_lite.dtd">
<TEI>
  <teiHeader>  
    <fileDesc>
      <titleStmt>
        <title>Przyjaciel szczery</title>
        <author>Jan Daniecki</author>
        <respStmt>
          <resp>wyd.</resp> 
          <name>Maciej Eder</name>
        </respStmt>
      </titleStmt>
    </fileDesc>
   </teiHeader>
</TEI>

The following xslt should DELETE the author node:

<?xml version="1.0" ?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:template match="author"/>
  <xsl:template match="*">
    <xsl:copy>
      <xsl:apply-templates />
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

However, the stylesheet ONLY deletes the node if I replace the dtd (which would be too long to post here) with an empty one.

I found out why: it's because using the DTD introduces a namespace. For the stylesheet to work, it turns out, I need to declare the namespace - then everything works, see

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
xmlns:tei="http://www.tei-c.org/ns/1.0" version="2.0">
  <xsl:template match="tei:author"/>
  <xsl:template match="*">
    <xsl:copy>
      <xsl:apply-templates />
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

I thought that only default attribute nodes are affected by the DTD definition during XSLT transformation. Could someone summarize how and when the DTD adds namespaces that need to be accounted for?

Thanks!

1

There are 1 answers

2
Jon Hanna On BEST ANSWER

I thought that only default attribute nodes are affected by the DTD definition during XSLT transformation.

Well, it can also affect entities, but default attribute nodes is precisely what's happening here.

Could someone summarize how and when the DTD adds namespaces that need to be accounted for?

Assuming the DTD you are using is one like this, then note:

<!ATTLIST TEI xmlns CDATA "http://www.tei-c.org/ns/1.0">

So that DTD says that the <TEI> element has a default attribute value for xmlns of http://www.tei-c.org/ns/1.0. Unless the element has an explicit xmlns attribute in the source it should be treated as if it has xlmns="http://www.tei-c.org/ns/1.0" on it.

Now, while to a namespace-aware XML process like XSLT transforms there is a difference between a namespace declaration and an attribute, to a non-namespace-aware XML process like DTD validation, namespace declarations are just like other attributes. (This was how namespaces could be added to XML in a backwards compatible way).

In fact most (all?) of the elements are defined this way, so after DTD validation your document:

<TEI>
  <teiHeader>  
    <fileDesc>
      <titleStmt>
        <title>Przyjaciel szczery</title>
        <author>Jan Daniecki</author>
        <respStmt>
          <resp>wyd.</resp> 
          <name>Maciej Eder</name>
        </respStmt>
      </titleStmt>
    </fileDesc>
   </teiHeader>
</TEI>

Becomes:

<TEI xmlns="http://www.tei-c.org/ns/1.0">
  <teiHeader xmlns="http://www.tei-c.org/ns/1.0">
    <fileDesc xmlns="http://www.tei-c.org/ns/1.0">
      <titleStmt xmlns="http://www.tei-c.org/ns/1.0">
        <title xmlns="http://www.tei-c.org/ns/1.0">Przyjaciel szczery</title>
        <author xmlns="http://www.tei-c.org/ns/1.0">Jan Daniecki</author>
        <respStmt xmlns="http://www.tei-c.org/ns/1.0">
          <resp xmlns="http://www.tei-c.org/ns/1.0">wyd.</resp> 
          <name xmlns="http://www.tei-c.org/ns/1.0">Maciej Eder</name>
        </respStmt>
      </titleStmt>
    </fileDesc>
   </teiHeader>
</TEI>

And then after that the XSLT transform is processed.