Turkish character encoding

29.4k views Asked by At

I try to create new sentence from different list items. Its giving error when I print it by unicode. I can print it normally (without unicode). When I try to post it to the web site its rasing same error. I tought that if I can fix it with unicode, it will work when ı post it to the website.

p=['Bu', 'Şu']
k=['yazı','makale']
t=['hoş','ilgiç']
connect='%s %s %s'%(p[randint(0,len(p)-1)],k[randint(0,len(k)-1)],t[randint(0,len(t)-1)])
print unicode(connect)

And the output is :
Error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 0: ordinal not in range(128)
4

There are 4 answers

2
Phani On

You should put a header like this at the top of your script and specify the encoding on your system. It is recommended you read more on this as you might often run into these kind of problems. Some resources here.

#!/usr/bin/env python
# -*- coding: latin-1 -*-

Be sure to substitute the above 'latin-1' with the proper one for you.

1
Burak Yılmaztürk On

First of all you should put at the top of your script # -*- coding: utf-8 -*- to be able to use non-ascii characters in your script. Also while printing decode str to unicode will solve your problem.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from random import randint

p=['Bu', 'şu']
k=['yazı','makale']
t=['hoş','ilginç']
connect='%s %s %s'%(p[randint(0,len(p)-1)],k[randint(0,len(k)-1)],t[randint(0,len(t)-1)])
print connect.decode('utf-8')
0
Irshad Bhat On
>>> p=['Bu', 'Şu']
>>> k=['yazı','makale']
>>> t=['hoş','ilgiç']
>>> connect='%s %s %s'%(p[randint(0,len(p)-1)],k[randint(0,len(k)-1)],t[randint(0,len(t)-1)])
>>> print connect.decode('utf-8')
Şu makale ilgiç
0
Mark Tolonen On

When using non-ASCII characters, specify the encoding of the source code at the top of the file. Then, use Unicode strings for all text:

#coding:utf8
from random import randint
p=[u'Bu', u'Şu']
k=[u'yazı', u'makale']
t=[u'hoş', u'ilgiç']
connect= u'%s %s %s'%(p[randint(0,len(p)-1)],k[randint(0,len(k)-1)],t[randint(0,len(t)-1)])
print connect

Output:

Şu yazı ilgiç

You could still get UnicodeEncodeError if your execution environment doesn't support the character set. Ideally use an environment that supports an output encoding of UTF-8.