python unicode woes - convert cp1252 string to unicode

818 views Asked by At

I think I'm just fundamentally confused about char sets that are not ascii.

I have a python file that I have declared at the top to be # -*- coding: cp1252 -*-.

In the file I have question = "what is your borther’s name", for example.

type(question)

>> str

question

>> 'what is your borther\xe2\x80\x99s name'

And I cannot convert to unicode at this point, presumably because you can't go from ASCII to Unicode.

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 20: ordinal not in range(128)

if I declare as unicode to begin with:

question = "what is your borther’s name"

>> u'what is your borther\u2019s name'

How do I get "what is your borther’s name" back? Or is just a how python interpreter displays unicode strings and it in fact will encode correctly when I pass it to an unicode-aware application (in this case, Office)?

I need to preserve the special characters but I still need to do a string comparison using Levenshtein library (pip install python-Levenshtein).

Levenshtein.ratio takes str or unicode for both of its arguments, but not mixed.

1

There are 1 answers

0
Ignacio Vazquez-Abrams On

I have a plain text file that I have declared at the top to be # -*- coding: cp1252 -*-.

That does nothing.

with codecs.open(..., encoding='cp1252') as fp:
   ...