python 2.7 character \u2013

32k views Asked by At

I have following code:

# -*- coding: utf-8 -*-

print u"William Burges (1827–81) was an English architect and designer."

When I try to run it from cmd. I get following message:

Traceback (most recent call last):
  File "C:\Python27\utf8.py", line 3, in <module>
    print u"William Burges (1827ŌĆō81) was an English architect and designer."
  File "C:\Python27\lib\encodings\cp775.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u2013' in position
 20: character maps to <undefined>

How could I solve this problem and make Python read this \u2013 character? And why Python doesn't read it with existing code, I thought that utf-8 works for every character.

Thank you

EDIT:

This code prints out wanted outcome:

# -*- coding: utf-8 -*-

print unicode("William Burges (1827-81) was an English architect and designer.", "utf-8").encode("cp866")

But when I try to print more than one sentence, for example:

# -*- coding: utf-8 -*-

print unicode("William Burges (1827–81) was an English architect and designer. I am here. ", "utf-8").encode("cp866")

I get same error message:

Traceback (most recent call last):
  File "C:\Python27\utf8vs.py", line 3, in <module>
    print unicode("William Burges (1827ŌĆō81) was an English architect and desig
ner. I am here. ", "utf-8").encode("cp866")
  File "C:\Python27\lib\encodings\cp866.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u2013' in position
 20: character maps to <undefined>
3

There are 3 answers

0
Jack Aidley On

I suspect the problem is down to the print statement rather than anything inherent to the python (it works fine on my Mac). In order to print the string, it needs to convert it into a displayable format; the longer dash you've used isn't displayable in the default character set on the Windows command line.

The difference between your two sentences is not in the length but in the kind of dash used in "(1827-81)" vs "(1827–81)" - can you see the subtle difference? Try copying and pasting one over the other to check this.

See also Python, Unicode, and the Windows console.

0
Michael Kazarian On

Your string contain ndash sumbol. It similr to ascii minus -, see symbol No 45 an ascii table. Replace ndash to minus, because ascii can't contain ndash. Below work variant:

# -*- coding: utf-8 -*-

my_string = "William Burges (1827–81) was an English architect and designer."
my_string = my_string.replace("–", "-")# replace utf-8 symbol (ndash) to ascii (-)
print my_string

output

William Burges (1827-81) was an English architect and designer. I am here. 
0
Kirill Zaitsev On

There is actually a wiki article on wiki.python.org about this issue https://wiki.python.org/moin/PrintFails that explains why this might happen with charmap codec.

Setting the PYTHONIOENCODING environment variable as described above can be used to suppress the error messages. Setting to "utf-8" is not recommended as this produces an inaccurate, garbled representation of the output to the console. For best results, use your console's correct default codepage and a suitable error handler other than "strict".