Python base32 data decode

10.2k views Asked by At

I am not able to understand why

import base64
base64.b32decode('siddh===', casefold=True);

works but

base64.b32decode('siddha==', casefold=True);

throws

TypeError: Incorrect padding
2

There are 2 answers

1
Eric Appelt On BEST ANSWER

The python base64 module follows RFC 3548. For base32 encoding,

Padding at the end of the data is performed using the "=" character. Since all base 32 input is an integral number of octets, only the following cases can arise:

(1) the final quantum of encoding input is an integral multiple of 40 bits; here, the final unit of encoded output will be an integral multiple of 8 characters with no "=" padding,

(2) the final quantum of encoding input is exactly 8 bits; here, the final unit of encoded output will be two characters followed by six "=" padding characters,

(3) the final quantum of encoding input is exactly 16 bits; here, the final unit of encoded output will be four characters followed by four "=" padding characters,

(4) the final quantum of encoding input is exactly 24 bits; here, the final unit of encoded output will be five characters followed by three "=" padding characters, or

(5) the final quantum of encoding input is exactly 32 bits; here, the final unit of encoded output will be seven characters followed by one "=" padding character.

You can see that there is no valid case for RFC 3548 base32 encoding that would result in six characters and two padding characters.

Five characters gives you 25 bits total, so it is enough to encode three bytes with one extra bit. Six characters would give you 30 bits total, which is still not enough for four bytes. With seven characters you get 35 bits, which is enough for four bytes. Since six characters is no better than five for encoding an integral number of bytes, it is excluded from the standard for the final padded 40-bit input group of eight characters including padding.

0
kravietz On

Also note that character set for Base32 is uppercase letters A-Z2-7= per RFC 4648 (which is newer than RFC 3548). There's also a variant called Base32-hex which uses '0-9A-V=' so no Base32 decoder should accept lowercase letters.