Moving files with subfolders based on dictionary Python

46 views Asked by At

I am trying to figure out a faster way to copy and move pdf files that are in multiple subfolders. The current code I am using will find the file if it is anywhere in the main folder directory but this takes a very long time given the number of subfolders and files. My current code has written Python the following letter:

Hiya Python! See that last column in the list I sent ya? Just look in all the folders and subfolders in this location where all of these folders and subfolders exist to find that pdf. When you do, copy and move it to this other location I sent you. Take your time!

The letter I would like to write to Python:

Dear Python, Take this list. The first column is the folder, the second is the subfolder, and the last is the pdf. I'm going to give you a location where all of these folders and subfolders exist. Once in that location, open the first folder, then search for the second folder. When you find the second folder, open this folder and search for the pdf. Once you find the pdf, make a copy and move this to this other location I sent you.

For some reason, I'm having a hard time wrapping my head around the next steps. May I please have your expertise (fairly new PyUser here)?

I have a list imported with the files needed in a .txt:

TESTFOLDER_FROM0|TESTSUBFOLDER_FROM0|TEST1.pdf
TESTFOLDER_FROM0|TESTSUBFOLDER_FROM0|TEST2.pdf
TESTFOLDER_FROM1|TESTSUBFOLDER_FROM1|TEST3.pdf
TESTFOLDER_FROM2|TESTSUBFOLDER_FROM2|TEST4.pdf
TESTFOLDER_FROM3|TESTSUBFOLDER_FROM5|TEST5.pdf
TESTFOLDER_FROM5|TESTSUBFOLDER_FROM8|TEST6.pdf
TESTFOLDER_FROM637|TESTSUBFOLDER_FROM11|TEST7.pdf

Here is the working snail code:

import csv
import os
import shutil

csv_path = input('Enter path to chart list text file: ')
csv_path = csv_path.replace("\\", "/")
csv_path = csv_path.replace("\"", "")

base_path = input('Enter base path where charts are to be copied FROM: ')
base_path = base_path.replace("\\", "/")
base_path = base_path.replace("\"", "")

destination_path = input('Enter path where the files should be copied TO: ')
destination_path = destination_path.replace("\\", "/")
destination_path = destination_path.replace("\"", "")

def find_file(base_path, file_name):
    for root, dirs, files in os.walk(base_path):
        if file_name in files:
            return os.path.join(root, file_name)
    return os.path.join(root, file_name)

find_file(base_path, csv_path)

with open(csv_path, 'r') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter='|')
    for row in csv_reader:
        print("CSV Row:", row)
        _, _, file_name = row
        file_path = find_file(base_path, file_name)
        if file_path:
            print("Found File Path:", file_path)
            print("Copying file to:", destination_path)
            shutil.copy(file_path, destination_path)
        else:
            print("File not found!")
1

There are 1 answers

3
tripleee On

Your problem description is rather hard to understand, but the basic problem seems to be that you repeatedly run os.walk on the same file tree. The obvious optimization is to only run it once, and have it look for all the files you want to look for. Something like this?

def find_em_all(paths, basedir):
    for root, subdirs, files in os.walk(basedir):
        head = root.split(os.sep)[-2:]
        for file in files:
            pathtail = os.path.join(*head, file)
            if pathtail in paths:
                yield os.path.join(root, file)

Call it like

needles = [
    'one/wanted/path.pdf',
    'another/desired/file.pdf'
]
for found in find_em_all(needles, '/your/haystack/directory'):
    ... # do something with found

The code in find_em_all basically requires all the search expressions to be entirely within the directory tree you traverse, or the starting directory to be an absolute path (so if your current directory is /home/you and you have a file /home/you/want/this.pdf, it won't find it with find_em_all('you/want/this.pdf', '.') because it only sees ./want/this.pdf). There is also a corner case if you start it in a directory whose absolute path is too near the root directory to produce two parts from the split.