Why are the ⟨ and ⟩ characters handled so oddly?

3k views Asked by At

http://www-archive.mozilla.org/newlayout/testcases/layout/entities.html has a section called "Miscellaneous Technical". Two of the characters included in that section are "left-pointing angle bracket" (⟨ and 〈) and "right-pointing angle bracket" (⟩ and 〉). There are some strange things with these characters (the first three items in the following list were tested on my first gen Moto X running Android 4.4.4):

  1. In Chrome 43.0.2357.93 on Android, all four entities (⟨, 〈, ⟩, and 〉) are invisible (they just look like spaces).
  2. In Opera 30.0.1856.92967 on Android, all four entities are invisible (they just look like spaces).
  3. In Firefox 38.0.5 on Android, the two named entities (⟨ and ⟩) look like black blocks, but the two coded entities (〈 and 〉) look correct.
  4. I used Xcode's iOS Simulator to simulate the iPhone 5s running iOS 8.3 (12F69). I then loaded the link in Safari. All four entities looked correct, but the two named entities (⟨ and ⟩) looked very different than the two coded entities (〈 and 〉).
  5. If you convert the page to HTML5 and run it through the validator, it outputs "Warning: Text run is not in Unicode Normalization Form C." about the two coded entities (〈 and 〉), but no other entities on the entire page receive any warnings or errors.

I think the most interesting items in the above list are the last three since it appears that the named entities and the coded entities are not treated equally.

All the other characters seem fine, at least from what I saw. What makes these characters so odd?

Here is the "Miscellaneous Technical" section from the link at the top of this question, just in case the link ever stops working:

<h3>Miscellaneous Technical</h3>

<table>
  <caption align=bottom>
    [1] lang is NOT the same character as U+003C 'less than' or U+2039 'single left-pointing angle quotation mark'<br>
    [2] rang is NOT the same character as U+003E 'greater than' or U+203A 'single right-pointing angle quotation mark'
  </caption>
  <tr>
    <th>Entity</th>
    <th>Code</th>
    <th>Named</th>
    <th>Coded</th>
    <th>Description</th>
  </tr>
  <tr>
    <td>lceil</td>
    <td>8968</td>
    <td>"&lceil;"</td>
    <td>"&#8968;"</td>
    <td>left ceiling = apl upstile</td>
  </tr>
  <tr>
    <td>rceil</td>
    <td>8969</td>
    <td>"&rceil;"</td>
    <td>"&#8969;"</td>
    <td>right ceiling</td>
  </tr>
  <tr>
    <td>lfloor</td>
    <td>8970</td>
    <td>"&lfloor;"</td>
    <td>"&#8970;"</td>
    <td>left floor = apl downstile</td>
  </tr>
  <tr>
    <td>rfloor</td>
    <td>8971</td>
    <td>"&rfloor;"</td>
    <td>"&#8971;"</td>
    <td>right floor</td>
  </tr>
  <tr>
    <td>lang</td>
    <td>9001</td>
    <td>"&lang;"</td>
    <td>"&#9001;"</td>
    <td>left-pointing angle bracket = bra [1]</td>
  </tr>
  <tr>
    <td>rang</td>
    <td>9002</td>
    <td>"&rang;"</td>
    <td>"&#9002;"</td>
    <td>right-pointing angle bracket = ket [2]</td>
  </tr>
</table>
1

There are 1 answers

0
Andrey On

According to HTML5 specification named character references (entities) are coded differently than as per HTML 4 specification (referenced in the question).

&lang; as U+027E8 (&#10216;)

&rang; as U+027E9 (&#10217;)

In practice it seems modern browsers follow HTML5 approach even for documents with HTML 4 doctype. It explains difference of displaying &lang; and &#9001;)

The fact &lang; and &rang; are not properly rendered in mobile browsers may be because of lack of support by font.