I'm trying to get the index of 'J' in a string that is similar to myString = "███ ███ J ██"
so I use myString.find('J')
but it returns a really high value and if I replace '█' by 'M' or another character of the alphabet I get a lower value. I don't really understand what's the cause of that.
█ character string indexed in python
152 views Asked by mel At
3
There are 3 answers
2
On
To check your encoding run: python -c 'import sys; print(sys.getdefaultencoding())'
For Python 2.x the output is ascii
and this is a default encoding for your programs. To use some non-ascii characters developers predicted a unicode() type. See for yourself. Just create a variable myString = u"███ ███ J ██"
and follow on it .find('J')
method. This u
prefix says to interpreter that it deals with Unicode-encoded string. Then you can use this variable like if it was normal str.
I've used Unicode in some places where I should write UTF-8. For difference check this great answer if you want to.
Unicode is a default encoding in Python 3.x, so this problem does not occur.
Try doing
myString = u"███ ███ J ██"
. This will make it a Unicode string instead of the python 2.x default of an ASCII string.If you are reading it from a file or a file-like object, instead of doing
file.read()
, dofile.read().encode('utf-8-sig')
.