I have a single directory full of millions of files with file names such as e.g.:
234.txt
235.txt
236.txt
I would like to work through the files with a name that has an integer prefix above a certain value, which is determined by the last file processed in a previous run and fetched from a database.
At the minute I have:
for root, dirs, files in os.walk(directory):
for filename in files:
if int(re.split("\.",filename)[0]) > last_processed_id:
<do some thing with file>
But I have hundreds of thousands of files, so this approach takes some time doing pointless work checking if the filename has been processed before. Is there a faster/better way to limit the files returned from os.walk() short of moving the files. once processed.