In my application - following Ned Batchelder's recommendations of making a unicode sandwich - I first try to decode from Windows-1252 to UTF-8:

row[field] =row[field].decode('cp1252').encode('utf-8')

Later on, when I want to send my data to an endpoint I decode UTF-8:

row[field] = fld.decode('utf-8')

When I print just the field that has the offending Windows-1252 characters, it interprets them as such:

print row['dash']
# as well — ... “the intent was"

But when I try to print the entire dictionary I get unicode values:

print row
# as well \xe2\x80\x93 ... \xe2\x80\x9cthe intent was\xe2\x80\x9d

I want the wp-1252 characters themselves or equivalents such as the straight quotation mark instead of the left or right quotation mark.

0

There are 0 answers