The Python 3.4 and Python 3.8/3.9 are different when I try execute below statement:
print('\u212B')
Python 3.8/3.9 can print it correctly.
Å
Python 3.4 will report an exception:
Traceback (most recent call last):
File "test.py", line 9, in <module>
print('\u212B')
UnicodeEncodeError: 'gbk' codec can't encode character '\u212b' in position 0: illegal multibyte sequence
And according to this page, I can avoid the exception by overwrite sys.stdout via statement:
sys.stdout = io.TextIOWrapper(buffer=sys.stdout.buffer,encoding='utf-8')
But python 3.4 still print different charactor as below:
鈩?
So my questions are:
- Why do different python versions have different behaviors on stand output print?
- How can I print correct value
Åin python 3.4?
Edit 1:
I guess the difference is caused by PEP 528 -- Change Windows console encoding to UTF-8. But I still don't understand the machanism of console encoding and how I can print correct character in Python 3.4.
Edit 2:
One more difference, sys.getfilesystemencoding() will get utf-8 in Python 3.8/3.9 and get mbcs in Python 3.4.
Why?
Regarding the rationale behind the
stdoutencoding you can read more in the answers here: Changing default encoding of Python?In short, Python 3.4 is using your OS's encoding by default as the one for
stdoutwhereas with Python 3.8 it is set to UTF-8.How to fix this?
You can use a new method -
reconfigureintroduced with Python 3.7:Typically, you can try setting the environment variable
PYTHONIOENCODINGtoutf-8:in most of the operating systems except Windows where another environment variable must be set for it to work:
You can fix it in the version of Python preceding v. 3.7 via installing
win-unicode-consolepackage that handles UTF issues transparently on Windows:If you are not running the code directly from a console there is a possibility that your IDE configuration is interfering.