I'm trying to make a simple word counter program in Python 3.4.1 where the user is to input a list of comma-separated words that are then analyzed for frequency in a sample text file.
I'm currently stuck on how to search for the entered list of words in the text file.
First I tried:
file = input("What file would you like to open? ")
f = open(file, 'r')
search = input("Enter the words you want to search for (separate with commas): ").lower().split(",")
search = [x.strip(' ') for x in search]
count = {}
for word in search:
count[word] = count.get(word,0)+1
for word in sorted(count):
print(word, count[word])
This resulted in:
What file would you like to open? twelve_days_of_fast_food.txt
Enter the words you want to search for (separate with commas): first, rings, the
first 1
rings 1
the 1
If that's anything to go by, I'm guessing this method only gave me the count of the words in the input list and not the count of the input list of words in the text file. So then I tried:
file = input("What file would you like to open? ")
f = open(file, 'r')
lines = f.readlines()
line = f.readline()
word = line.split()
search = input("Enter the words you want to search for (separate with commas): ").lower().split(",")
search = [x.strip(' ') for x in search]
count = {}
for word in lines:
if word in search:
count[word] = count.get(word,0)+1
for word in sorted(count):
print(word, count[word])
This gave me nothing back. This is what happened:
What file would you like to open? twelve_days_of_fast_food.txt
Enter the words you want to search for (separate with commas): first, the, rings
>>>
What am I doing wrong? How can I fix this problem?
You read all lines first (into
lines
, then tried to read just one line but the file already gave you all lines. In that casef.readline()
gives you an empty line. From there on out your script is doomed to fail; you cannot count words in an empty line.You can loop over the file instead:
The
with
statement uses the opened file object as a context manager; this just means it'll be closed again automatically when done.The
for line in f:
loop iterates over each separate line in the input file; this is more efficient than usingf.readlines()
to read all lines into memory at once.I also cleaned up your search word stripping a little, and set the
count
dictionary to one with all the search words pre-defined to0
; this makes the actual counting a little easier.Because you now have a dictionary with all the search words, testing for matching words is best done against that dictionary. Testing against a dictionary is faster than testing against a list (the latter is a scan that takes longer the more words are in the list, while a dictionary test takes constant time on average, regardless of the number of items in the dictionary).