█ character string indexed in python

152 views Asked by At

I'm trying to get the index of 'J' in a string that is similar to myString = "███ ███ J ██" so I use myString.find('J') but it returns a really high value and if I replace '█' by 'M' or another character of the alphabet I get a lower value. I don't really understand what's the cause of that.

3

There are 3 answers

0
Aereaux On BEST ANSWER

Try doing myString = u"███ ███ J ██". This will make it a Unicode string instead of the python 2.x default of an ASCII string.

If you are reading it from a file or a file-like object, instead of doing file.read(), do file.read().encode('utf-8-sig').

2
kamarkiewicz On

To check your encoding run: python -c 'import sys; print(sys.getdefaultencoding())'

For Python 2.x the output is ascii and this is a default encoding for your programs. To use some non-ascii characters developers predicted a unicode() type. See for yourself. Just create a variable myString = u"███ ███ J ██" and follow on it .find('J') method. This u prefix says to interpreter that it deals with Unicode-encoded string. Then you can use this variable like if it was normal str.

I've used Unicode in some places where I should write UTF-8. For difference check this great answer if you want to.

Unicode is a default encoding in Python 3.x, so this problem does not occur.

0
Zoran Pavlovic On

Check the settings of the console/ssh client you are using. Set it to be UTF-8.