How do I figure out what encoding was used to produce some garbled Chinese text?

Question

How do I figure out what encoding was used to produce some garbled Chinese text?

1.5k views Asked by Matthew Chatham At 12 September 2017 at 18:34

I have some text which was translated from English into Simplified Chinese. However, when I received the file back, the characters were garbled. So, for example, we have a line that reads "ÎªÁËÓÐÐ§¡¢¸ßÐ§µØÊµÏÖÄ¿±ê£¬Äú×îÐèÒªµÄÊÇÊ²Ã´£¿" rather than containing the Chinese characters I would expect.

I've tried pasting the above string into a Python interpreter, converting it to Unicode, and decoding with various Chinese character sets, to no avail. Does anyone have insight on this? Thank you.

Original Q&A

There are 1 answers

**Josh Lee** · Accepted Answer · 2017-09-12T18:49:10+00:00

Chardet:

>>> s = "ÎªÁËÓÐÐ§¡¢¸ßÐ§µØÊµÏÖÄ¿±ê£¬Äú×îÐèÒªµÄÊÇÊ²Ã´£¿"
>>> chardet.detect(s.encode('l1'))
{'encoding': 'GB2312', 'confidence': 0.99, 'language': 'Chinese'}
>>> s.encode('l1').decode('gb2312')
'为了有效、高效地实现目标，您最需要的是什么？'

TechQA.

How do I figure out what encoding was used to produce some garbled Chinese text?

There are 1 answers

Related Questions in PYTHON

Related Questions in CHINESE-LOCALE

Related Questions in MOJIBAKE

Popular Questions

Popular Tags

Trending Questions