I wrote a script that goes through a list of strings and looks for matches in either (1) lines in a file or (2) filenames in a directory.
Depending on the search mode (user input), each string is appended to a list if (1) it matches the line/filename exactly or (2) the line/filename contains it.
import os
import operator
query_list = ["look", "for", "these"]
search_object = "name_of_file_or_directory"
if os.path.isfile(search_object):
input_object = 1
with open(search_object, "r") as file:
lines = file.read().split("\n")
elif os.path.isdir(search_object):
input_object = 2
search_mode = input("Are you looking for objects that (1) exactly match or (2) contain the query? ")
matched = []
def comparison():
if search_mode == "1" and operator.eq(string, query) or search_mode == "2" and operator.contains(string, query):
matched.append(string)
return matched
for query in query_list:
def search_loop(container):
# container is either list of filenames or list of lines
global string
for string in container:
matched = comparison()
return matched
if input_object == 1:
matched = search_loop(lines)
elif input_object == 2:
matched = search_loop(os.listdir(search_object))
print(matched)
I defined the comparison()
function so I can use it more often later in the script without repeating those lines. However, I had to assign global string
within the other function then, as I get the following error otherwise.
NameError: name 'string' is not defined
I'm wondering how I could avoid using a global variable. I read that classes may often be used to avoid such, but I couldn't figure out how this would be useful here. I'd appreciate any recommendations on how to solve this task in a nice way.
There are a few things we can do to clean up the code. We'll end up getting rid of the global variable by the end.
Make data consistent
Instead of waiting until the end to get the directory contents, we can get it at the very beginning. This way we can treat the lines of text and the directory contents exactly the same later on.
When it makes sense, select the correct operation early
Before, we were checking each time which kind of operation to do (exact match or contains). Instead, we can select the option right when we get the user input and assign it to a single name to be used later on.
Reorganize code into functions
Now that we have a better picture of the parts of the program, we can break it into functions. Using functions gives parts of the program concrete names and also helps narrow the scope of the variables they use. The fewer variables used in any piece of code the easier it is to understand later. We avoid global variables by choosing our functions carefully - one important criteria we might use is that everything in the function is self-contained and doesn't need to reference anything outside.
We'll also use the
if __name__ == '__main__'
convention so we could include our script from another script if we wanted to.Further improvements
raise
an appropriate error (for something that can be handled) orassert
(for something that we don't expect to happen and cannot really recover from).argparse
can make it easier to take arguments into the script so we don't have to hard-code variable values.sys.stdin
and leave it to the user to generate text somehow and pipe it to the program, like:cat words.txt | python script.py
for contents orls . | python script.py
for files in a directory.