python unicode woes - convert cp1252 string to unicode

Question

python unicode woes - convert cp1252 string to unicode

822 views Asked by stewart99 At 09 January 2014 at 10:44

I think I'm just fundamentally confused about char sets that are not ascii.

I have a python file that I have declared at the top to be # -*- coding: cp1252 -*-.

In the file I have question = "what is your borther’s name", for example.

type(question)

>> str

question

>> 'what is your borther\xe2\x80\x99s name'

And I cannot convert to unicode at this point, presumably because you can't go from ASCII to Unicode.

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 20: ordinal not in range(128)

if I declare as unicode to begin with:

question = "what is your borther’s name"

>> u'what is your borther\u2019s name'

How do I get "what is your borther’s name" back? Or is just a how python interpreter displays unicode strings and it in fact will encode correctly when I pass it to an unicode-aware application (in this case, Office)?

I need to preserve the special characters but I still need to do a string comparison using Levenshtein library (pip install python-Levenshtein).

Levenshtein.ratio takes str or unicode for both of its arguments, but not mixed.

Original Q&A

There are 1 answers

**Ignacio Vazquez-Abrams** · Answer 1 · 2014-01-09T10:49:50+00:00

Ignacio Vazquez-Abrams On 09 January 2014 at 10:49

I have a plain text file that I have declared at the top to be # -*- coding: cp1252 -*-.

That does nothing.

with codecs.open(..., encoding='cp1252') as fp:
   ...

TechQA.

python unicode woes - convert cp1252 string to unicode

There are 1 answers

Related Questions in PYTHON

Related Questions in UNICODE

Related Questions in ENCODING

Related Questions in MS-OFFICE

Related Questions in CP1252

Popular Questions

Popular Tags

Trending Questions