I have a directory structure on my filesystem, like this:
folder_to_scan/
important_file_a
important_file_b
important_folder_a/
important_file_c
important_folder_b/
important_file_d
useless_folder/
...
I want to recursively scan through folder_to_scan/
, and get all the file names.
At the same time, I want to ignore useless_folder/
, and anything under it.
If I do something like this:
path_to_search = Path("folder_to_scan")
[pth for pth in path_to_search.rglob("*") if pth.is_file() and 'useless_folder' not in [parent.name for parent in pth.parents]]
It will work (probably - I didn't bother trying), but the problem is, useless_folder/
contains millions of files, and rglob
will still traverse all of them, take ages, and only apply the filter when constructing the final list.
Is there a way to tell Python not to waste time traversing useless folders (useless_folder/
in my case)?
You can easily write your own file iterator using recursion.