Converting utf-8 to latin-1 in Python

Question

Converting utf-8 to latin-1 in Python

12.1k views Asked by OregonTrail At 14 November 2014 at 20:24

I want to do this:

Take the bytes of this utf-8 string:

访视频

Encode those bytes in latin-1 and print the result:

è®¿è§†é¢‘

How do I do this in Python?

# -*- coding: utf-8
s = u'访视频'.encode('latin-1')

Causes this exception:

s = u'访视频'.encode('latin-1')
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-2: ordinal not in range(256)

Original Q&A

There are 2 answers

Mazdak On 14 November 2014 at 20:30

you need to first encode to UTF-8 (UTF-8 can encode any Unicode string) and yet fully compatible with the 7-bit ASCII set (any ASCII bytestring is a correct UTF-8–encoded string). :

>>> u'访视频'.encode('UTF-8').decode('latin-1')
u'\xe8\xae\xbf\xe8\xa7\x86\xe9\xa2\x91'

Note : The UTF-8 encoding can handle any Unicode character. It is also backwards compatible with ASCII, so that a pure ASCII file can also be considered a UTF-8 file, and a UTF-8 file that happens to use only ASCII characters is identical to an ASCII file with the same characters

**abarnert** · Accepted Answer · 2014-11-14T20:29:21+00:00

What you're asking to do is literally impossible. You can't encode those characters to Latin-1, because those characters don't exist in Latin-1.

To get the output you want, you want to decode the UTF-8 bytes as if they were Latin-1. Like this:

s = u'访视频'.encode('utf-8').decode('latin-1')

However, your desired output doesn't look like actual Latin-1, because in Latin-1, characters \x86 and \x91 are non-printable, so you're going to get this:

è®¿è§ é¢

(Notice that space in the middle in place of †, and the missing ‘ at the end; those are actually invisible control characters, not spaces.)

It looks like you want a Latin-1 superset, probably Windows codepage 1252. In which case what you really want is:

s = u'访视频'.encode('utf-8').decode('cp1252')

TechQA.

Converting utf-8 to latin-1 in Python

There are 2 answers

Related Questions in PYTHON

Related Questions in UNICODE

Related Questions in ENCODING

Related Questions in UTF-8

Related Questions in LATIN1

Popular Questions

Trending Questions