Chinese text encoding missing characters when viewed in web browser

Question

Chinese text encoding missing characters when viewed in web browser

829 views Asked by user2539827 At 14 December 2016 at 18:15

I have a HTML file which contains Chinese text. When I open the file in any web browser, there are characters which appear to be missing.

Here's an example copied from the browser window:

本函旨在邀請您參�� 定於

I know for a fact that all other characters seen here are correct aside from the missing ones (confirmed by a native Chinese speaker).

In the HTML header, I have a tag which signifies the file contains UTF-8 encoded characters:

<META http-equiv="Content-Type" content="text/html; charset=utf-8">

I've already tried some other charsets in this META tag, but so far it seems any encoding method I try aside from UTF-8 ends up looking worse.

I also considered the possibility that it is a font issue, so I installed 3 different traditional Chinese fonts on my system and forced Chrome to use them. None of them made any difference - missing characters were still present.

If I open the HTML file with Notepad++, here's what I can see:

https://i.stack.imgur.com/Ex3C1.png

If I select and copy-paste this text into regular MS Notepad, I get this:

本函旨在邀請您參劦nbsp;定於

So you can see here that the "xE5 x8A" visible in Notepad++ seems to have been replaced by 劦.

Is there any reason why the browser would be showing �� instead of 劦 in this scenario?

Original Q&A

There are 1 answers

**John Machin** · Accepted Answer · 2016-12-18T09:52:49+00:00

Look again at the HTML file.

I see the first 2 bytes of a character encoded in UTF-8, followed by ... let's imagine there was originally a \xA0, and this was mutated to   when the file was created by applying global substitutions to the UTF-8-encoded data.

However, \xE5\x8A\xA0 UTF-8 decodes to U+52A0 which is not the same as the alien character which is U+52A6 ... not close enough to an answer.

TechQA.

Chinese text encoding missing characters when viewed in web browser

There are 1 answers

Related Questions in HTML

Related Questions in ENCODING

Related Questions in UTF-8

Related Questions in CHARACTER-ENCODING

Related Questions in CJK

Popular Questions

Trending Questions