Why is my code returning an "ascii codec" error when I'm trying to use utf-8?

149 views Asked by nalthea At 12 October 2023 at 01:21

I'm new to Python, and am only starting to use it as part of a CTF challenge (I'm a cybersecurity student). I was given a mostly pre-built "decoder" script, and the assignment was to complete it. So, I have my word list imported into the script as a list variable, I have a while loop running through each word, it's all great...except as soon as a non-ASCII character comes up in the word list (in this case it's the word "Français") I get this error thrown at me:

Traceback (most recent call last):
  File "dict.py", line 32, in <module>
    cipher = AES.new(secret + (BLOCK_SIZE - len(codecs.encode(secret, 'utf-8' )) % BLOCK_SIZE) * PADDING, AES.MODE_ECB)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)

Which would make sense if the code didn't specify utf-8 encoding, plus when I echo $LANG in the terminal I get en_GB.UTF-8...not ascii!

For reference, here's the original code as I received it:

# pip install pycryptodome
from Crypto.Cipher import AES
import base64

BLOCK_SIZE = 32

PADDING = '{'

# Encrypted text to decrypt
encrypted = "9l21XiUohS1j9kUx02KJNAkjw51pupGgiMlCkuVNEMo="

def decode_aes(c, e):
    return c.decrypt(base64.b64decode(e)).decode('latin-1').rstrip(PADDING)

secret = "password"

if secret[-1:] == "\n":
    print("Error, new line character at the end of the string. This will not match!")
elif len(secret.encode('utf-8')) >= 32:
    print("Error, string too long. Must be less than 32 bytes.")
else:
    # create a cipher object using the secret
    cipher = AES.new(secret + (BLOCK_SIZE - len(secret.encode('utf-8')) % BLOCK_SIZE) * PADDING, AES.MODE_ECB)

    # decode the encoded string
    decoded = decode_aes(cipher, encrypted)

    if decoded.startswith('FLAG:'):
        print("\n")
        print("Success: "+secret+"\n")
        print(decoded+"\n")
    else:
        print('Wrong password')

and here it is after I've modified it (unchanged portion omitted):



with open('words.txt', 'r') as f:
    words = f.readlines()

i = 0

while i < len(words):

    secret = words[i][0:-1]
    print(secret)

    if secret[-1:] == "\n":
        print("Error, new line character at the end of the string. This will not match!")
    elif len(secret.encode('utf-8')) >= 32:
        print("Error, string too long. Must be less than 32 bytes.")
    else:
        # create a cipher object using the secret
        cipher = AES.new(secret + (BLOCK_SIZE - len(secret.encode('utf-8')) % BLOCK_SIZE) * PADDING, AES.MODE_ECB)

        # decode the encoded string
        decoded = decode_aes(cipher, encrypted)

        if decoded.startswith('FLAG:'):
            print("\n")
            print("Success: "+secret+"\n")
            print(decoded+"\n")
            break
        else:
            print('Wrong password')
            print(i)
            i += 1

This works up until "Français."

I'm running it in Python 2.7. I've tried running it in python3, but then the import of Crypto.Cipher doesn't work (ModuleNotFoundError: No module named 'Crypto'), and I really don't know enough about Python to be able to fix that. I have also tried installing the codecs module and replacing secret.encode('utf-8') with codecs.encode(secret,'utf-8') but I get the same error.

I don't need anyone to solve the rest of this assignment for me. If there are other reasons my code won't work, don't tell me, I'll figure it out. But this encoding thing really doesn't seem like it's meant to be part of the challenge, and I'm stumped as to what to do next. Why is it using ASCII as its default encoding despite arguments to the contrary, and how do I fix it?

Original Q&A

TechQA.

Why is my code returning an "ascii codec" error when I'm trying to use utf-8?

There are 0 answers

Related Questions in PYTHON

Related Questions in UTF-8

Related Questions in CHARACTER-ENCODING

Related Questions in NON-ASCII-CHARACTERS

Popular Questions

Trending Questions