Python 2 vs Python 3 - Encoding

46 views Asked by At

I have a simple code:

# -*- coding: utf-8 -*-
text = "12É45678"
print(len(text))

See the Upper E with accent

Then when I run from python 2, the result is 9 when I run from python 3, the result is 8

How to obtain 8 in python 2 (native)

1

There are 1 answers

0
Brian61354270 On

In Python 2, str is a naive sequence of bytes (what we call bytes in Python 3). To interpret arbitrary bytes as unicode codepoints, you need to decode them into a unicode object:

# -*- coding: utf-8 -*-
text = "12É45678"
print(len(text))
print(len(text.decode("utf-8")))

In Python 2, this prints

9
8

See also the Unicode HOWTO from the Python 2 documentation.