Why does a CP-1252 ellipsis show up as u with ring above (ů) on some browsers

225 views Asked by At

For some reason, on some browsers, a CP-1252 ellipsis (0x85) is showing up as ů. I believe the server is claiming the page will be in UTF-8 (don't ask me why a UTF-8 server is serving CP-1252, that is out of scope). I would understand throwing a warning because it isn't valid UTF-8. I would understand it showing up as the Latin1 character U+0085 NEXT LINE (NEL). But I can't for the life of me figure out why it displays as U+016F LATIN SMALL LETTER U WITH RING ABOVE.

This is what I am seeing:

enter image description here

And here is a hexdump -C of the file

00000000  78 78 78 78 78 78 78 78  78 78 78 78 78 78 78 78  |xxxxxxxxxxxxxxxx|
*
00000030  78 85 3c 2f 69 3e 0d 0a                           |x.</i>..|
00000038
1

There are 1 answers

0
JosefZ On

Flagrant mojibake case. Once upon time I have written a small .bat script that shows mappings of (most known) OEM and ANSI code pages to Unicode table and vice versa. Here's a particular result for 0x85 code:

==> alts.bat 0x85
CP/ACP  Hex  Codepoint  #Description   :show8bit 133 <--> 0x85)
------  ---  ---------  ------------------------
CP1250  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1251  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1252  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1253  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1254  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1255  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1256  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1257  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1258  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP437   0x85    0x00e0  #LATIN SMALL LETTER A WITH GRAVE
CP737   0x85    0x0396  #GREEK CAPITAL LETTER ZETA
CP775   0x85    0x0123  #LATIN SMALL LETTER G WITH CEDILLA
CP850   0x85    0x00e0  #LATIN SMALL LETTER A WITH GRAVE
CP852   0x85    0x016f  #LATIN SMALL LETTER U WITH RING ABOVE
CP855   0x85    0x0401  #CYRILLIC CAPITAL LETTER IO
CP857   0x85    0x00e0  #LATIN SMALL LETTER A WITH GRAVE
CP860   0x85    0x00e0  #LATIN SMALL LETTER A WITH GRAVE
CP861   0x85    0x00e0  #LATIN SMALL LETTER A WITH GRAVE
CP862   0x85    0x05d5  #HEBREW LETTER VAV
CP863   0x85    0x00e0  #LATIN SMALL LETTER A WITH GRAVE
CP864   0x85    0x2500  #FORMS LIGHT HORIZONTAL
CP865   0x85    0x00e0  #LATIN SMALL LETTER A WITH GRAVE
CP866   0x85    0x0415  #CYRILLIC CAPITAL LETTER IE
CP869   0x85            #UNDEFINED
CP874   0x85    0x2026  #HORIZONTAL ELLIPSIS
CP932   0x85            #DBCS LEAD BYTE
CP936   0x85            #DBCS LEAD BYTE
CP949   0x85            #DBCS LEAD BYTE
CP950   0x85            #DBCS LEAD BYTE

==>

and vice versa for 0x2026 codepoint (sorry for bad output columns shift in case of non-windows CP lines):

==> alts.bat 0x2026
CP/ACP  Hex  Codepoint  #Description   :show16bit 8230 <--> 0x2026
------  ---  ---------  -------------------------
CP1250  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1251  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1252  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1253  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1254  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1255  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1256  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1257  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP1258  0x85    0x2026  #HORIZONTAL ELLIPSIS
CP874   0x85    0x2026  #HORIZONTAL ELLIPSIS
CP932   0x8163  0x2026  #HORIZONTAL ELLIPSIS
CP936   0xA1AD  0x2026  #HORIZONTAL ELLIPSIS
CP949   0xA1A6  0x2026  #HORIZONTAL ELLIPSIS
CP950   0xA14B  0x2026  #HORIZONTAL ELLIPSIS
macCYRILLIC_CP  0xC9    0x2026  #HORIZONTAL ELLIPSIS
macGREEK_CP     0xC9    0x2026  #HORIZONTAL ELLIPSIS
macICELAND_CP   0xC9    0x2026  #HORIZONTAL ELLIPSIS
macLATIN2_CP    0xC9    0x2026  #HORIZONTAL ELLIPSIS
macROMAN_CP     0xC9    0x2026  #HORIZONTAL ELLIPSIS
macTURKISH_CP   0xC9    0x2026  #HORIZONTAL ELLIPSIS

==>

Further reading: Encodings and Code Pages