Are parsed entities permitted as entity values in XML?

775 views Asked by At

In a natural XML document (either 1.0 or 1.1), is it permitted to have an attribute value, which is typed as ENTITY in the DTD, to be a parsed entity?

The following is a table copied from the XML spec (section 4.4), summarizing the rules for entity inclusion. (The red lasoo bits are my addition.)

Rules for entity inclusion

My reading of this is: No, parsed entities are not permitted as ENTITY type attribute values. Forbidden is defined in the spec as leading to a fatal error.

However, here comes the mystery. If I present the document in listing 1 ....

Listing 1

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE a-doc-type [
 <!ENTITY internal "stuff and nonsense">
 <!NOTATION jpg SYSTEM "image/jpeg"> 
 <!ENTITY file_pic SYSTEM "file.jpg" NDATA jpg>
 <!ENTITY source-text SYSTEM "source-text.txt">
 <!ELEMENT test-case EMPTY>
 <!ATTLIST test-case source-entity ENTITY #REQUIRED>
]>
<container>
  <!-- Test case 10. An internal general entity as an entity value is forbidden. -->
  <test-case case-number="10" source-entity="internal"/>

  <!-- Test case 11. An external parsed general entity as an entity value is forbidden. -->
  <test-case case-number="11" source-entity="source-text"/>

  <!-- Test case 12. When an unparsed entity as an entity value, and the processor is validating, 
       the processor must inform the application of the system and public (if any) identifiers for
        both the entity and its associated notation. (In this case "file.jpg" and "image/jpeg"). -->
  <test-case case-number="12" source-entity="file_pic"/>

  <!-- Test case 13. Character references are not recognised in attribute values of type ENTITY. -->
  <test-case case-number="13" source-entity="&#169;"/>
</container>

... to an XSLT processor (Saxon-HE 9.5.1.1N), transforming it with the identity transform (listing 2)

Listing 2

<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes" encoding="utf-8" omit-xml-declaration="yes" />

<xsl:template match="@*|node()">
  <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
  </xsl:copy>
</xsl:template>

</xsl:stylesheet>

... one gets output as shown in listing 3 ...

Listing 3

<container>
  <!-- Test case 10. An internal general entity as an entity value is forbidden. -->
  <test-case case-number="10" source-entity="internal"/>

  <!-- Test case 11. An external parsed general entity as an entity value is forbidden. -->
  <test-case case-number="11" source-entity="source-text"/>

  <!-- Test case 12. When an unparsed entity as an entity value, and the processor is validating, 
       the processor must inform the application of the system and public (if any) identifiers for
        both the entity and its associated notation. (In this case "file.jpg" and "image/jpeg"). -->
  <test-case case-number="12" source-entity="file_pic"/>

  <!-- Test case 13. Character references are not recognised in attribute values of type ENTITY. -->
  <test-case case-number="13" source-entity="آ©"/>
</container>

That is not an expected result! If parsed entities (test cases 11 and 12), are truly forbidden as attribute values, then XML processor that Saxon uses to read the input document should throw a fatal error, and consequently there should be no output.

Q1.

What is happening here? The rules say forbidden, but the XML processor accepts it any way.

Q2.

And another question: What is the point of this rule? Clearly the ENTITY type attribute is designed for specifying entities as the value of an attribute. It is fine to do so when the entity is unparsed external. So why would the fact that the processor has parsed an external entity, suddenly make that entity unsuitable as the value of an attribute?

1

There are 1 answers

0
Paul Sweatte On

Saxon uses SAX2:

By default Saxon uses an XML parser that supports the SAX2 interface. Saxon has been tested successfully in the past with a wide variety of such parsers including Ælfred, Xerces, Lark, SUN Project X, Crimson, Piccolo, Oracle XML, xerces, xml4j, and xp. The recommended parser is the Apache version of Xerces (we have found this to be more reliable than the version bundled in the JDK). By default, however, Saxon uses the parser that comes with the Java platform. The parser must be SAX2-compliant. All the relevant JAR files must be installed on your Java CLASSPATH.

so the entities in attributes test case may be irrevelant based on the underlying SAX2 parser. For instance, in C++:

However, MSXML SAX2 does not report entities in attributes. They are quietly skipped.

and external entities expose security vulnerabilities:

Java applications using XML libraries are particularly vulnerable to XML External Entities (XXE) because the default settings for most Java XML parsers is to have XXE enabled. To use these parsers safely, you have to explicitly disable XXE in the parser you use.

References