Not allowed decimal numeric character reference: forbidden or text?

428 views Asked by At

According to HTML 5.1 spec :: Decimal numeric character reference:

The ampersand must be followed by a "#" (U+0023) character, followed by one or more ASCII digits, representing a base-ten integer that corresponds to a Unicode code point that is allowed according to the definition below. The digits must then be followed by a ";" (U+003B) character.

and below:

The numeric character reference forms described above are allowed to reference any Unicode code point other than U+0000, U+000D, permanently undefined Unicode characters (noncharacters), surrogates (U+D800–U+DFFF), and control characters other than space characters.

I am confused. Does it mean (bold text) that characters that cannot be referenced (like U+000 or U+00D) are forbidden or just treated as text, not as references?

TL;DR Should I throw a validation error on entities that cannot be referenced like 
 or treat them just as text?

2

There are 2 answers

0
Nisse Engström On BEST ANSWER

8.2.4.69 Tokenizing character references says:

Otherwise, if the number is in the range 0xD800 to 0xDFFF or is greater than 0x10FFFF, then this is a parse error. Return a U+FFFD REPLACEMENT CHARACTER character token.

Otherwise, return a character token for the Unicode character whose code point is that number. Additionally, if the number is in the range 0x0001 to 0x0008, 0x000D to 0x001F, 0x007F to 0x009F, 0xFDD0 to 0xFDEF, or is one of 0x000B, 0xFFFE, 0xFFFF, 0x1FFFE, 0x1FFFF, 0x2FFFE, 0x2FFFF, 0x3FFFE, 0x3FFFF, 0x4FFFE, 0x4FFFF, 0x5FFFE, 0x5FFFF, 0x6FFFE, 0x6FFFF, 0x7FFFE, 0x7FFFF, 0x8FFFE, 0x8FFFF, 0x9FFFE, 0x9FFFF, 0xAFFFE, 0xAFFFF, 0xBFFFE, 0xBFFFF, 0xCFFFE, 0xCFFFF, 0xDFFFE, 0xDFFFF, 0xEFFFE, 0xEFFFF, 0xFFFFE, 0xFFFFF, 0x10FFFE, or 0x10FFFF, then this is a parse error.

0
Etheryte On

While it isn't specified in the spec as far as I can find, most (if not all?) modern browsers still treat them as characters, but if they don't belong in the known scale an unknown symbol marker is printed instead:

Sample

However, an answer drawing from credible (specification) sources would be better as I believe this question is widely applicable.

Also see this answer to a related question.