I am transforming a TEI encoded text document (TEI - Text Encoding Initiative, a standard in text document encoding), using XSLT 1 and 2 and various processors.
I am experiencing a very peculiar problem. Depending on which DTD I supply in the header of the XML file I get different results. An example input file:
<!DOCTYPE TEI SYSTEM "tei_lite.dtd">
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title>Przyjaciel szczery</title>
<author>Jan Daniecki</author>
<respStmt>
<resp>wyd.</resp>
<name>Maciej Eder</name>
</respStmt>
</titleStmt>
</fileDesc>
</teiHeader>
</TEI>
The following xslt should DELETE the author node:
<?xml version="1.0" ?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="author"/>
<xsl:template match="*">
<xsl:copy>
<xsl:apply-templates />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
However, the stylesheet ONLY deletes the node if I replace the dtd (which would be too long to post here) with an empty one.
I found out why: it's because using the DTD introduces a namespace. For the stylesheet to work, it turns out, I need to declare the namespace - then everything works, see
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:tei="http://www.tei-c.org/ns/1.0" version="2.0">
<xsl:template match="tei:author"/>
<xsl:template match="*">
<xsl:copy>
<xsl:apply-templates />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
I thought that only default attribute nodes are affected by the DTD definition during XSLT transformation. Could someone summarize how and when the DTD adds namespaces that need to be accounted for?
Thanks!
Well, it can also affect entities, but default attribute nodes is precisely what's happening here.
Assuming the DTD you are using is one like this, then note:
So that DTD says that the
<TEI>
element has a default attribute value forxmlns
ofhttp://www.tei-c.org/ns/1.0
. Unless the element has an explicitxmlns
attribute in the source it should be treated as if it hasxlmns="http://www.tei-c.org/ns/1.0"
on it.Now, while to a namespace-aware XML process like XSLT transforms there is a difference between a namespace declaration and an attribute, to a non-namespace-aware XML process like DTD validation, namespace declarations are just like other attributes. (This was how namespaces could be added to XML in a backwards compatible way).
In fact most (all?) of the elements are defined this way, so after DTD validation your document:
Becomes:
And then after that the XSLT transform is processed.