I'm trying to make a Markov text generator but I keep getting a KeyError
that I don't understand.
In this function, I keep getting a Keyerror
in the line w1, w2 = w2, random.choice(self.wordlog[w1])
.
self.gensize
is the number of words I want to generate,
self.size
is the total number of words in my text,
self.wordlog
is the dictionary - i.e. {'secondly': ['because', 'because'], 'pardon': ['cried', 'cried', 'said', 'said', 'your', 'she'], 'limited': ['to', 'warranty', 'right', 'right', 'to'], etc...}
def generate(self):
startseed = random.randint(0, self.size - 1)
w1, w2 = self.words[startseed], self.words[startseed+1]
#at this point, w1 and w2 are a random word and a following word-i.e. alice ran
wordlist = []
for i in range(self.gensize):
wordlist.append(w1)
w1, w2 = w2, random.choice(self.wordlog[w1])
#i.e. self.wordlog[alice] should return a list of all the values the word alice precedes
wordlist.append(w2)
print wordlist
When I run the function (print markov("alice.txt", 5).generate()
), I just keep getting a KeyError
- a different word each time (which is to be expected, as the starting seed and the random.choice will lead to this).
Anyone see what's wrong with this and how to fix this?
EDIT:
Here's the rest of the code, so you can see where self.words
and everything else is coming from:
class markov(object):
def __init__(self, filename, gensize):
self.wordlog = {}
self.filename = filename
self.words = self.file_to_words()
self.size = len(self.words)
self.gensize = gensize
def file_to_words(self):
with open(self.filename, "r") as file_opened:
text = file_opened.read().translate(None, string.punctuation)
mixedcasewords = text.split()
words = [x.lower() for x in mixedcasewords]
return words
def doubles(self):
for i in range((self.size)-1):
yield (self.words[i], self.words[i+1])
def catalog(self):
for w1, w2 in self.doubles():
self.wordlog.setdefault(w1, []).append(w2)
print self.wordlog
I think that's because you're using
random.choice
with adict
instead of alist
/set
/tuple
It's difficult to say but maybe you should check
self.wordlog
just to make sure.[EDIT] Maybe it's just while trying to fulfill the given gensize reaches a key that doesn't exist.
starts a
for
loop with five iterations. For each of the iteration you should be sure that the randomly picked keyw1
is actually a key of thewordlog
.To ensure this isn't a problem you can do 2 things:
Approach 1
Check
w1 in wordlog
orelse: break
.This approach may give a solution smaller than the asked gensize.
Approach 2
Make sure it works for ANY given gensize.
You can do this easyly linking the wordlog keys and values in loops,
like in
{'a':['b','a'],'b':['b','a']}