I have a requirement to get the file details for certain locations (within the system and SFTP) and get the file size for some locations on SFTP which can be achieved using the shared code.
def getFileDetails(location: str):
filenames: list = []
if location.find(":") != -1:
for file in glob.glob(location):
filenames.append(getFileNameFromFilePath(file))
else:
with pysftp.Connection(host=myHostname, username=myUsername, password=myPassword) as sftp:
remote_files = [x.filename for x in sorted(sftp.listdir_attr(location), key=lambda f: f.st_mtime)]
if location == LOCATION_SFTP_A:
for filename in remote_files:
filenames.append(filename)
sftp_archive_d_size_mapping[filename] = sftp.stat(location + "/" + filename).st_size
elif location == LOCATION_SFTP_B:
for filename in remote_files:
filenames.append(filename)
sftp_archive_e_size_mapping[filename] = sftp.stat(location + "/" + filename).st_size
else:
for filename in remote_files:
filenames.append(filename)
sftp.close()
return filenames
There are more than 10000+ files in LOCATION_SFTP_A and LOCATION_SFTP_B. For each file, I need to get the file size. To get the size I am using
sftp_archive_d_size_mapping[filename] = sftp.stat(location + "/" + filename).st_size
sftp_archive_e_size_mapping[filename] = sftp.stat(location + "/" + filename).st_size
# Time Taken : 5 min+
sftp_archive_d_size_mapping[filename] = 1 #sftp.stat(location + "/" + filename).st_size
sftp_archive_e_size_mapping[filename] = 1 #sftp.stat(location + "/" + filename).st_size
# Time Taken : 20-30 s
If I comment sftp.stat(location + "/" + filename).st_size
and assign static value It takes only 20-30 seconds to run the entire code. I am looking for a way How can optimize the time and get the file size details.
The
Connection.listdir_attr
already gives you the file size inSFTPAttributes.st_size
.There's no need to call
Connection.stat
for each file to get the size (again).See also: