I am trying to figure out a faster way to copy and move pdf files that are in multiple subfolders. The current code I am using will find the file if it is anywhere in the main folder directory but this takes a very long time given the number of subfolders and files. My current code has written Python the following letter:
Hiya Python! See that last column in the list I sent ya? Just look in all the folders and subfolders in this location where all of these folders and subfolders exist to find that pdf. When you do, copy and move it to this other location I sent you. Take your time!
The letter I would like to write to Python:
Dear Python, Take this list. The first column is the folder, the second is the subfolder, and the last is the pdf. I'm going to give you a location where all of these folders and subfolders exist. Once in that location, open the first folder, then search for the second folder. When you find the second folder, open this folder and search for the pdf. Once you find the pdf, make a copy and move this to this other location I sent you.
For some reason, I'm having a hard time wrapping my head around the next steps. May I please have your expertise (fairly new PyUser here)?
I have a list imported with the files needed in a .txt:
TESTFOLDER_FROM0|TESTSUBFOLDER_FROM0|TEST1.pdf
TESTFOLDER_FROM0|TESTSUBFOLDER_FROM0|TEST2.pdf
TESTFOLDER_FROM1|TESTSUBFOLDER_FROM1|TEST3.pdf
TESTFOLDER_FROM2|TESTSUBFOLDER_FROM2|TEST4.pdf
TESTFOLDER_FROM3|TESTSUBFOLDER_FROM5|TEST5.pdf
TESTFOLDER_FROM5|TESTSUBFOLDER_FROM8|TEST6.pdf
TESTFOLDER_FROM637|TESTSUBFOLDER_FROM11|TEST7.pdf
Here is the working snail code:
import csv
import os
import shutil
csv_path = input('Enter path to chart list text file: ')
csv_path = csv_path.replace("\\", "/")
csv_path = csv_path.replace("\"", "")
base_path = input('Enter base path where charts are to be copied FROM: ')
base_path = base_path.replace("\\", "/")
base_path = base_path.replace("\"", "")
destination_path = input('Enter path where the files should be copied TO: ')
destination_path = destination_path.replace("\\", "/")
destination_path = destination_path.replace("\"", "")
def find_file(base_path, file_name):
for root, dirs, files in os.walk(base_path):
if file_name in files:
return os.path.join(root, file_name)
return os.path.join(root, file_name)
find_file(base_path, csv_path)
with open(csv_path, 'r') as csv_file:
csv_reader = csv.reader(csv_file, delimiter='|')
for row in csv_reader:
print("CSV Row:", row)
_, _, file_name = row
file_path = find_file(base_path, file_name)
if file_path:
print("Found File Path:", file_path)
print("Copying file to:", destination_path)
shutil.copy(file_path, destination_path)
else:
print("File not found!")
Your problem description is rather hard to understand, but the basic problem seems to be that you repeatedly run
os.walkon the same file tree. The obvious optimization is to only run it once, and have it look for all the files you want to look for. Something like this?Call it like
The code in
find_em_allbasically requires all the search expressions to be entirely within the directory tree you traverse, or the starting directory to be an absolute path (so if your current directory is/home/youand you have a file/home/you/want/this.pdf, it won't find it withfind_em_all('you/want/this.pdf', '.')because it only sees./want/this.pdf). There is also a corner case if you start it in a directory whose absolute path is too near the root directory to produce two parts from thesplit.