A certain Python API returns u'J\xe4rvenp\xe4\xe4'
for the finish word Järvenpää.
where \xe4 == ä
I then am calling email.header to add this field to a header to be printed.
email.header
falls over when it tries to decode the umlaut:
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/email/header.py", line 73, in decode_header
header = str(header)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 1: ordinal not in range(128)
I've tried a couple of things:
- Addding
# -*- coding: utf-8 -*-
to the top of header.py - Calling
unicode()
on the Finnish string before passing it to email.header - Calling
.encode('utf-8')
on the Finnish string before passing it to email.header
None have solved the problem. What I am doing wrong? I'd imagine that a solution won't involve modifying header.py
(a core Python module).
Python version: 2.7.10
UPDATE:
Header() is not being instantiated directly. Rather I'm callind the decode_header() function on the string:
email.Header.decode_header(theString)
It seems now that simply extend this thus:
email.Header.decode_header(theString.encode('utf-8'))
solves the problem
In order to have the
email.header
module handle encoding for you and create a proper header, you have to create an instance ofemail.header.Header
with your string and the charset it should be encoded in:For example:
The string can be either a unicode string or a byte string.
charset
will only affect what encoding the header is encoded with.charset
will both determine what encoding the byte string is assumed to be in, and what encoding will be used to encode the header. If the byte string you provide can't be decoded with thatcharset
, an exception will be raised.