How can I search a text file for a list of words from user input?

6.5k views Asked by At

I'm trying to make a simple word counter program in Python 3.4.1 where the user is to input a list of comma-separated words that are then analyzed for frequency in a sample text file.

I'm currently stuck on how to search for the entered list of words in the text file.

First I tried:

file = input("What file would you like to open? ")
f = open(file, 'r')
search = input("Enter the words you want to search for (separate with commas): ").lower().split(",")
search = [x.strip(' ') for x in search]
count = {}
for word in search:
    count[word] = count.get(word,0)+1
for word in sorted(count):
    print(word, count[word])

This resulted in:

What file would you like to open? twelve_days_of_fast_food.txt
Enter the words you want to search for (separate with commas): first, rings, the
first 1
rings 1
the 1

If that's anything to go by, I'm guessing this method only gave me the count of the words in the input list and not the count of the input list of words in the text file. So then I tried:

file = input("What file would you like to open? ")
f = open(file, 'r')
lines = f.readlines()
line = f.readline()
word = line.split()
search = input("Enter the words you want to search for (separate with commas): ").lower().split(",")
search = [x.strip(' ') for x in search]
count = {}
for word in lines:
    if word in search:
        count[word] = count.get(word,0)+1
for word in sorted(count):
    print(word, count[word])

This gave me nothing back. This is what happened:

What file would you like to open? twelve_days_of_fast_food.txt
Enter the words you want to search for (separate with commas): first, the, rings
>>> 

What am I doing wrong? How can I fix this problem?

2

There are 2 answers

5
Martijn Pieters On BEST ANSWER

You read all lines first (into lines, then tried to read just one line but the file already gave you all lines. In that case f.readline() gives you an empty line. From there on out your script is doomed to fail; you cannot count words in an empty line.

You can loop over the file instead:

file = input("What file would you like to open? ")

search = input("Enter the words you want to search for (separate with commas): ")
search = [word.strip() for word in search.lower().split(",")]

# create a dictionary for all search words, setting each count to 0
count = dict.fromkeys(search, 0)

with open(file, 'r') as f:
    for line in f:
        for word in line.lower().split():
            if word in count:
                # found a word you wanted to count, so count it
                count[word] += 1

The with statement uses the opened file object as a context manager; this just means it'll be closed again automatically when done.

The for line in f: loop iterates over each separate line in the input file; this is more efficient than using f.readlines() to read all lines into memory at once.

I also cleaned up your search word stripping a little, and set the count dictionary to one with all the search words pre-defined to 0; this makes the actual counting a little easier.

Because you now have a dictionary with all the search words, testing for matching words is best done against that dictionary. Testing against a dictionary is faster than testing against a list (the latter is a scan that takes longer the more words are in the list, while a dictionary test takes constant time on average, regardless of the number of items in the dictionary).

0
Jerome Anthony On

You could try this;

import re
import collections

wanted = ["cat", "dog"]
matches = re.findall('\w+',open('hamlet.txt').read().lower())
counts = collections.Counter(matches) # Count each occurance of words
map(lambda x:(x,counts[x]),wanted) # Will print the counts for wanted words

I referenced this solution when forming the answer.