I need to get rid of polish characters from string I got from xml file. I use .replace() but in this case it doesn't work. Why? The code:
# -*- coding: utf-8
from prestapyt import PrestaShopWebService
from xml.etree import ElementTree
prestashop = PrestaShopWebService('http://localhost/prestashop/api',
'key')
prestashop.debug = True
name = ElementTree.tostring(prestashop.search('products', options=
{'display': '[name]', 'filter[id]': '[2]'}), encoding='cp852',
method='text')
print name
print name.replace('ł', 'l')
Output:
Naturalne mydło odświeżające
Naturalne mydło odświeżające
But when I try to replace non polish character it works fine.
print name
print name.replace('a', 'o')
Result:
Naturalne mydło odświeżające
Noturolne mydło odświeżojące
This also work's fine:
name = "Naturalne mydło odświeżające"
print name.replace('ł', 'l')
Any advise?
You are mixing encodings with your byte strings. Here's a short working example reproducing the issue. I assume you are running in a Windows console that defaults to an encoding of
cp852
:Output (no replacement):
The reason is, the
name
string was encoded incp852
but the byte string constant'ł'
is encoded in the source code encoding ofutf-8
.Output:
The best solution is to use Unicode strings:
Output (replacement was made):
Note that Python 3's
et.tostring
has a Unicode option, and string constants are Unicode by default. Therepr()
version of the string is more readable as well, butascii()
implements the old behavior. You'll also find that Python 3.6 will print Polish even to consoles not using a Polish code page, so maybe you wouldn't need to replace the characters at all.Output: