Create a list from a txt file with all italian words, python

2.3k views Asked by At

I downloaded from here ( https://sourceforge.net/project/showfiles.php?group_id=128318&package_id=141110 ) a dictionary with all the italian words. I wanted to add them to a list so I could use randint(0, list.count()) to use a random italian word as a string. The little program is basically "Guess the word". I used this guide: Creating a list of every word from a text file without spaces, punctuation to see what I could do to achieve this, but I received ascii code errors. Then I tried to replace the odd characters (‡ËÈÏÚ) with their effective characters (àèèìù) and I removed the first line of the thesaurus.txt file, but I still get the same ascii error. Thank you in advance for your help, I'm a newbie in Python. Oh, and here's the code, so that you can recreate the ascii error (you can download the .txt file from the link above).

import re
from random import *
file = open('thesaurus.txt', 'r')
# .lower() returns a version with all upper case characters replaced with lower case characters.
text = file.read().lower()
file.close()
# replaces anything that is not a lowercase letter, a space, or an apostrophe with a space:
text = re.sub('[^a-z\ \']+', " ", text)
words = list(text.split())

word = random.choice(words)

wordlist = list(word)

guess = []
prove = []

for l in range(len(word)):
    guess.append("_")
attempts_remaining = 6
print (guess)
print("You have", attempts_remaining, "attempts")
guessed = 0
while guessed < len(word) and attempts_remaining != 0:
    if attempts_remaining != 0:
        guessed_this_time = 0
        prova = input("Write a letter: ")
        prove.append(prova)

        if prove.count(prova) > 1:
            prova = input("You've already tried this letter! Try with another: ")


        for l in range(len(word)):


            if wordlist[l] == prova:

                guess[l] = wordlist [l]
                guessed += 1
                guessed_this_time += 1    


            else:
                pass
        if guessed_this_time == 0:
            attempts_remaining -= 1
            print(attempts_remaining, "attempts remaining")
        elif guessed == len(word):
            print("You guessed it!")
        print(guess)
    if attempts_remaining == 0:
        print("You lost, the word was", word)

The error is:

Traceback (most recent call last):
  File "/Users/user/Documents/Python/Impiccato/impiccato.py", line 5, in <module>
    text = file.read().lower()
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0x88 in position 2170: ordinal not in range(128)

Edit: I "solved" moving all the txt content into the .py and calling it this way:

text = """
here's all the list (the whole thing is 19414Ln LOL)
... """
words = \
re.sub('[^a-z\ \']+', " ", text).split() # Stores the secret words in a list

This way commas are replaced by spaces.

1

There are 1 answers

4
Maltysen On

It was the encoding. I don't know what that website had it set as, but its not what you want. Just open it up in you favorite editor and set the encoding to UTF-8. It still won't work, because of another bug: .count() does not give you the amount of items. You want len(words).