email.header not handling Finnish characters?

Question

email.header not handling Finnish characters?

985 views Asked by Pyderman At 17 June 2015 at 13:23

A certain Python API returns u'J\xe4rvenp\xe4\xe4' for the finish word Järvenpää.

where \xe4 == ä

I then am calling email.header to add this field to a header to be printed.

email.header falls over when it tries to decode the umlaut:

  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/email/header.py", line 73, in decode_header
    header = str(header)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 1: ordinal not in range(128)

I've tried a couple of things:

Addding # -*- coding: utf-8 -*- to the top of header.py
Calling unicode() on the Finnish string before passing it to email.header
Calling .encode('utf-8') on the Finnish string before passing it to email.header

None have solved the problem. What I am doing wrong? I'd imagine that a solution won't involve modifying header.py (a core Python module).

Python version: 2.7.10

UPDATE:

Header() is not being instantiated directly. Rather I'm callind the decode_header() function on the string:

email.Header.decode_header(theString)

It seems now that simply extend this thus:

email.Header.decode_header(theString.encode('utf-8'))

solves the problem

Original Q&A

There are 2 answers

Alex Ivanov On 17 June 2015 at 13:46

AFAIK, str() deals with ascii that's why you get an error. If your string is unicode you should do header = unicode(header), if not it should be decoded first.

#!/usr/bin/python
# -*- coding: utf-8 -*-

header = unicode("Järvenpää".decode('UTF-8'))
print header

Output

Järvenpää

**Klaus D.** · Accepted Answer · 2015-06-17T13:31:00+00:00

In order to have the email.header module handle encoding for you and create a proper header, you have to create an instance of email.header.Header with your string and the charset it should be encoded in:

>>> h = Header(text, charset)

For example:

>>> t = u'J\xe4rvenp\xe4\xe4'
>>> print t
Järvenpää
>>> from email.header import Header
>>> h = Header(t, 'utf-8')
>>> h
<email.header.Header instance at 0x7fc2636e7950>
>>> print h
=?utf-8?b?SsOkcnZlbnDDpMOk?=
>>> h = Header(t, 'iso-8859-1')
>>> print h
=?iso-8859-1?q?J=E4rvenp=E4=E4?=

The string can be either a unicode string or a byte string.

If you use a unicode string, the charset will only affect what encoding the header is encoded with.
If you use a byte string, the charset will both determine what encoding the byte string is assumed to be in, and what encoding will be used to encode the header. If the byte string you provide can't be decoded with that charset, an exception will be raised.

TechQA.

email.header not handling Finnish characters?

There are 2 answers

Related Questions in PYTHON

Related Questions in UNICODE

Related Questions in UTF-8

Related Questions in UTF

Related Questions in PYTHON-UNICODE

Popular Questions

Popular Tags

Trending Questions