█ character string indexed in python

Question

█ character string indexed in python

152 views Asked by mel At 09 June 2015 at 22:31

I'm trying to get the index of 'J' in a string that is similar to myString = "███ ███ J ██" so I use myString.find('J') but it returns a really high value and if I replace '█' by 'M' or another character of the alphabet I get a lower value. I don't really understand what's the cause of that.

Original Q&A

There are 3 answers

kamarkiewicz On 09 June 2015 at 22:49

To check your encoding run: python -c 'import sys; print(sys.getdefaultencoding())'

For Python 2.x the output is ascii and this is a default encoding for your programs. To use some non-ascii characters developers predicted a unicode() type. See for yourself. Just create a variable myString = u"███ ███ J ██" and follow on it .find('J') method. This u prefix says to interpreter that it deals with Unicode-encoded string. Then you can use this variable like if it was normal str.

I've used Unicode in some places where I should write UTF-8. For difference check this great answer if you want to.

Unicode is a default encoding in Python 3.x, so this problem does not occur.

Zoran Pavlovic On 09 June 2015 at 22:50

Check the settings of the console/ssh client you are using. Set it to be UTF-8.

**Aereaux** · Accepted Answer · 2015-06-09T22:39:29+00:00

Aereaux On 09 June 2015 at 22:39 BEST ANSWER

Try doing myString = u"███ ███ J ██". This will make it a Unicode string instead of the python 2.x default of an ASCII string.

If you are reading it from a file or a file-like object, instead of doing file.read(), do file.read().encode('utf-8-sig').

TechQA.

█ character string indexed in python

There are 3 answers

Related Questions in PYTHON

Related Questions in STRING

Related Questions in UNICODE

Related Questions in ASCII

Related Questions in PYTHON-2.X

Popular Questions

Popular Tags

Trending Questions