In a natural XML document (either 1.0 or 1.1), is it permitted to have an attribute value, which is typed as ENTITY in the DTD, to be a parsed entity?
The following is a table copied from the XML spec (section 4.4), summarizing the rules for entity inclusion. (The red lasoo bits are my addition.)
My reading of this is: No, parsed entities are not permitted as ENTITY type attribute values. Forbidden is defined in the spec as leading to a fatal error.
However, here comes the mystery. If I present the document in listing 1 ....
Listing 1
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE a-doc-type [
<!ENTITY internal "stuff and nonsense">
<!NOTATION jpg SYSTEM "image/jpeg">
<!ENTITY file_pic SYSTEM "file.jpg" NDATA jpg>
<!ENTITY source-text SYSTEM "source-text.txt">
<!ELEMENT test-case EMPTY>
<!ATTLIST test-case source-entity ENTITY #REQUIRED>
]>
<container>
<!-- Test case 10. An internal general entity as an entity value is forbidden. -->
<test-case case-number="10" source-entity="internal"/>
<!-- Test case 11. An external parsed general entity as an entity value is forbidden. -->
<test-case case-number="11" source-entity="source-text"/>
<!-- Test case 12. When an unparsed entity as an entity value, and the processor is validating,
the processor must inform the application of the system and public (if any) identifiers for
both the entity and its associated notation. (In this case "file.jpg" and "image/jpeg"). -->
<test-case case-number="12" source-entity="file_pic"/>
<!-- Test case 13. Character references are not recognised in attribute values of type ENTITY. -->
<test-case case-number="13" source-entity="©"/>
</container>
... to an XSLT processor (Saxon-HE 9.5.1.1N), transforming it with the identity transform (listing 2)
Listing 2
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes" encoding="utf-8" omit-xml-declaration="yes" />
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
... one gets output as shown in listing 3 ...
Listing 3
<container>
<!-- Test case 10. An internal general entity as an entity value is forbidden. -->
<test-case case-number="10" source-entity="internal"/>
<!-- Test case 11. An external parsed general entity as an entity value is forbidden. -->
<test-case case-number="11" source-entity="source-text"/>
<!-- Test case 12. When an unparsed entity as an entity value, and the processor is validating,
the processor must inform the application of the system and public (if any) identifiers for
both the entity and its associated notation. (In this case "file.jpg" and "image/jpeg"). -->
<test-case case-number="12" source-entity="file_pic"/>
<!-- Test case 13. Character references are not recognised in attribute values of type ENTITY. -->
<test-case case-number="13" source-entity="آ©"/>
</container>
That is not an expected result! If parsed entities (test cases 11 and 12), are truly forbidden as attribute values, then XML processor that Saxon uses to read the input document should throw a fatal error, and consequently there should be no output.
Q1.
What is happening here? The rules say forbidden, but the XML processor accepts it any way.
Q2.
And another question: What is the point of this rule? Clearly the ENTITY type attribute is designed for specifying entities as the value of an attribute. It is fine to do so when the entity is unparsed external. So why would the fact that the processor has parsed an external entity, suddenly make that entity unsuitable as the value of an attribute?
Saxon uses SAX2:
so the entities in attributes test case may be irrevelant based on the underlying SAX2 parser. For instance, in C++:
and external entities expose security vulnerabilities:
References