Identical looking string but different bytes representation

Question

Identical looking string but different bytes representation

85 views Asked by Zhi Qin Tan At 28 August 2020 at 14:25

The upper string is typed by me while the bottom string is pulled from a database.

bytes('TOYOTA', 'utf-8')
>> b'TOYOTA'

bytes('ΤΟΥΟΤΑ', 'utf-8')
>> b'\xce\xa4\xce\x9f\xce\xa5\xce\x9f\xce\xa4\xce\x91'

This causes undesirable results when I want to check for its existence

'TOYOTA' == 'ΤΟΥΟΤΑ'
>> False

Any idea how to "fix" the incorrect string?

Original Q&A

There are 1 answers

**mkrieger1** · Accepted Answer · 2020-08-28T15:01:09+00:00

It appears those are Greek capital letters:

>>> import unicodedata
>>> s = 'ΤΟΥΟΤΑ'
>>> for c in s:
...     print(unicodedata.name(c))
... 
GREEK CAPITAL LETTER TAU
GREEK CAPITAL LETTER OMICRON
GREEK CAPITAL LETTER UPSILON
GREEK CAPITAL LETTER OMICRON
GREEK CAPITAL LETTER TAU
GREEK CAPITAL LETTER ALPHA

You could try to use one of the available third-party libraries to do a transliteration to the Latin alphabet, for example:

This is a similar question: How can I create a string in english letters from another language word?

TechQA.

Identical looking string but different bytes representation

There are 1 answers

Related Questions in PYTHON

Related Questions in PYTHON-3.X

Related Questions in UNICODE

Popular Questions

Trending Questions