I think I'm just fundamentally confused about char sets that are not ascii.
I have a python file that I have declared at the top to be # -*- coding: cp1252 -*-
.
In the file I have question = "what is your borther’s name"
, for example.
type(question)
>> str
question
>> 'what is your borther\xe2\x80\x99s name'
And I cannot convert to unicode at this point, presumably because you can't go from ASCII to Unicode.
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 20: ordinal not in range(128)
if I declare as unicode to begin with:
question = "what is your borther’s name"
>> u'what is your borther\u2019s name'
How do I get "what is your borther’s name" back? Or is just a how python interpreter displays unicode strings and it in fact will encode correctly when I pass it to an unicode-aware application (in this case, Office)?
I need to preserve the special characters but I still need to do a string comparison using Levenshtein library (pip install python-Levenshtein
).
Levenshtein.ratio takes str or unicode for both of its arguments, but not mixed.
That does nothing.