How to avoid global variable and repetition in python

415 views Asked by At

I wrote a script that goes through a list of strings and looks for matches in either (1) lines in a file or (2) filenames in a directory.

Depending on the search mode (user input), each string is appended to a list if (1) it matches the line/filename exactly or (2) the line/filename contains it.

import os
import operator

query_list = ["look", "for", "these"]

search_object = "name_of_file_or_directory"

if os.path.isfile(search_object):
    input_object = 1
    with open(search_object, "r") as file:
        lines = file.read().split("\n")

elif os.path.isdir(search_object):
    input_object = 2

search_mode = input("Are you looking for objects that (1) exactly match or (2) contain the query? ")

matched = []

def comparison():
    if search_mode == "1" and operator.eq(string, query) or search_mode == "2" and operator.contains(string, query):
        matched.append(string)
    return matched

for query in query_list:

    def search_loop(container):
    # container is either list of filenames or list of lines
        global string
        for string in container:
            matched = comparison()
        return matched

    if input_object == 1:
        matched = search_loop(lines)

    elif input_object == 2:
        matched = search_loop(os.listdir(search_object))

print(matched)

I defined the comparison() function so I can use it more often later in the script without repeating those lines. However, I had to assign global string within the other function then, as I get the following error otherwise. NameError: name 'string' is not defined

I'm wondering how I could avoid using a global variable. I read that classes may often be used to avoid such, but I couldn't figure out how this would be useful here. I'd appreciate any recommendations on how to solve this task in a nice way.

1

There are 1 answers

0
Chris Hunt On BEST ANSWER

There are a few things we can do to clean up the code. We'll end up getting rid of the global variable by the end.

Make data consistent

Instead of waiting until the end to get the directory contents, we can get it at the very beginning. This way we can treat the lines of text and the directory contents exactly the same later on.

import os
import operator

query_list = ["look", "for", "these"]

search_object = "name_of_file_or_directory"

if os.path.isfile(search_object):
    with open(search_object, "r") as file:
        lines = file.read().split("\n")

elif os.path.isdir(search_object):
    lines = os.listdir(search_object)

# At this point, lines is just a list of text - we don't care where it came from.

search_mode = input("Are you looking for objects that (1) exactly match or (2) contain the query? ")

matched = []

def comparison():
    if search_mode == "1" and operator.eq(string, query) or search_mode == "2" and operator.contains(string, query):
        matched.append(string)
    return matched

for query in query_list:

    def search_loop(container):
        # container is either list of filenames or list of lines
        global string
        for string in container:
            matched = comparison()
        return matched

    # We were able to remove checks from here
    matched = search_loop(lines)

print(matched)

When it makes sense, select the correct operation early

Before, we were checking each time which kind of operation to do (exact match or contains). Instead, we can select the option right when we get the user input and assign it to a single name to be used later on.

import os
import operator

query_list = ["look", "for", "these"]

search_object = "name_of_file_or_directory"

if os.path.isfile(search_object):
    with open(search_object, "r") as file:
        lines = file.read().split("\n")

elif os.path.isdir(search_object):
    lines = os.listdir(search_object)

search_mode = input("Are you looking for objects that (1) exactly match or (2) contain the query? ")
# Because the interface of both functions is the same we can use them directly.
# If they were not the same then we could write an adapter function that would make them have
# the same signatures so they could be used in the same way below.
if search_mode == "1":
    search_op = operator.eq
elif search_mode == "2":
    search_op = operator.contains

matched = []

for query in query_list:
    for string in lines:
        if search_op(string, query):
            matched.append(string)

print(matched)

Reorganize code into functions

Now that we have a better picture of the parts of the program, we can break it into functions. Using functions gives parts of the program concrete names and also helps narrow the scope of the variables they use. The fewer variables used in any piece of code the easier it is to understand later. We avoid global variables by choosing our functions carefully - one important criteria we might use is that everything in the function is self-contained and doesn't need to reference anything outside.

We'll also use the if __name__ == '__main__' convention so we could include our script from another script if we wanted to.

import os
import operator


def get_contents(search_object):
    if os.path.isfile(search_object):
        with open(search_object, "r") as file:
            return file.read().split("\n")
    elif os.path.isdir(search_object):
        return os.listdir(search_object)


def get_search_function(user_input):
    if search_mode == "1":
        return operator.eq
    elif search_mode == "2":
        return operator.contains


def find_matches(match_function, query_list, lines):
    matched = []
    for query in query_list:
        for string in lines:
            if match_function(string, query):
                matched.append(string)
    return matched


if __name__ == "__main__":
    search_object = "name_of_file_or_directory"
    lines = get_contents(search_object)

    search_mode_input = input("Are you looking for objects that (1) exactly match or (2) contain the query? ")
    search_function = get_search_function(search_mode_input)

    query_list = ["look", "for", "these"]
    matches = find_matches(search_function, query_list, lines)
    print(matches)

Further improvements

  1. Error handling - wherever something may go wrong we can raise an appropriate error (for something that can be handled) or assert (for something that we don't expect to happen and cannot really recover from).
  2. Modules like argparse can make it easier to take arguments into the script so we don't have to hard-code variable values.
  3. Instead of taking in a "search object" and then having to worry about whether it is a file or directory, we could also read from sys.stdin and leave it to the user to generate text somehow and pipe it to the program, like: cat words.txt | python script.py for contents or ls . | python script.py for files in a directory.