encode method for UTF16 and UTF32 in Python

124 views Asked by Wei Songhome At 22 December 2023 at 01:31

I want to investigate how do UTF16 and UTF32 work by looking specific characters in binary.

I know we can user "string".encoding("(encoding name)") to check its hex value in specific encoding and it works fine with UTF8.

but when it comes to UTF16 or 32, I found the result is different from the encodnig value it supposed to be.

for example, the first letter "あ" in Japanese, accordting to https://www.compart.com/en/unicode/U+3042 the hex value of UTF8,16,32 are E38182, 3042, 00003042

so if I execute the following code

print("あ".encode('utf-8'))
print("あ".encode('utf-16BE'))
print("あ".encode('utf-32BE'))

I will get

b'\xe3\x81\x82'
b'0B'
b'\x00\x000B'

as you can see, utf8 is identical with the code table, but 16 and 32 are wired... No idea how can 000B convert to 3042, do I misunderstand something of the encode method?

Original Q&A

TechQA.

encode method for UTF16 and UTF32 in Python

There are 0 answers

Related Questions in PYTHON

Related Questions in UNICODE

Related Questions in ENCODING

Related Questions in UTF-16

Related Questions in UTF-32

Popular Questions

Trending Questions