troubleshooting python keyerror printing random values from dictionaries with list of values

1k views Asked by At

I'm trying to make a Markov text generator but I keep getting a KeyError that I don't understand.

In this function, I keep getting a Keyerror in the line w1, w2 = w2, random.choice(self.wordlog[w1]).

self.gensize is the number of words I want to generate,

self.size is the total number of words in my text,

self.wordlog is the dictionary - i.e. {'secondly': ['because', 'because'], 'pardon': ['cried', 'cried', 'said', 'said', 'your', 'she'], 'limited': ['to', 'warranty', 'right', 'right', 'to'], etc...}

def generate(self):
    startseed = random.randint(0, self.size - 1)
    w1, w2 = self.words[startseed], self.words[startseed+1]
    #at this point, w1 and w2 are a random word and a following word-i.e. alice ran
    wordlist = [] 
    for i in range(self.gensize):
        wordlist.append(w1)
        w1, w2 = w2, random.choice(self.wordlog[w1])
    #i.e. self.wordlog[alice] should return a list of all the values the word alice precedes
    wordlist.append(w2)
    print wordlist

When I run the function (print markov("alice.txt", 5).generate()), I just keep getting a KeyError - a different word each time (which is to be expected, as the starting seed and the random.choice will lead to this).

Anyone see what's wrong with this and how to fix this?

EDIT:

Here's the rest of the code, so you can see where self.words and everything else is coming from:

class markov(object):
    def __init__(self, filename, gensize):
        self.wordlog = {}
        self.filename = filename
        self.words = self.file_to_words()
        self.size = len(self.words)
        self.gensize = gensize

    def file_to_words(self):
        with open(self.filename, "r") as file_opened:
            text = file_opened.read().translate(None, string.punctuation)
            mixedcasewords = text.split()
            words = [x.lower() for x in mixedcasewords]
            return words

    def doubles(self):
        for i in range((self.size)-1):
            yield (self.words[i], self.words[i+1])       

    def catalog(self):
        for w1, w2 in self.doubles():
            self.wordlog.setdefault(w1, []).append(w2)
        print self.wordlog
1

There are 1 answers

4
DuniC On

I think that's because you're using random.choice with a dict instead of a list / set / tuple
It's difficult to say but maybe you should check self.wordlog just to make sure.

for k,v in self.wordlog.items():
    if type(v) != list: print("This shouldn't happen! Check: '"+k+"'")

[EDIT] Maybe it's just while trying to fulfill the given gensize reaches a key that doesn't exist.

print markov("alice.txt", 5).generate()

starts a for loop with five iterations. For each of the iteration you should be sure that the randomly picked key w1 is actually a key of the wordlog.
To ensure this isn't a problem you can do 2 things:

Approach 1

Check w1 in wordlog or else: break.
This approach may give a solution smaller than the asked gensize.

Approach 2

Make sure it works for ANY given gensize.
You can do this easyly linking the wordlog keys and values in loops,
like in {'a':['b','a'],'b':['b','a']}