My file is in unicode. However, for some reason, I want to change it to plain ascii while dropping any characters that are not recognized in ascii. For example, I want to change u'This is a string�'
to just 'This is a string'
. Following is the code I use to do so.
ascii_str = unicode_str.encode('ascii', 'ignore')
However, I still get the following annoying error.
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf3 in position 0:
ordinal not in range(128)
How can I solve this problem? I am fine with plain ascii strings.
I assume that your
unicode_str
is a real unicode string.If not use this
Always the best way would be, find out which encoding you deal with and than decode it. So you have an unicode string in the right format. This means start at
unicode_str
either to be a real unicode string or read it with the right codec. I assume that there is a file. So the very best would be:Another desperate approach would be: