Find file on SFTP server that contains string content in Python

105 views Asked by At

Problem

In Python, my task is to find a file (XML) on an external SFTP server that contains

<myTag>BLA</myTag>

The problem is there could be more than 5000 files on the server.

Is there a way to do it efficiently?

Right now I am:

  • using pysftp to connect to SFTP server
  • gather all filenames from there and filter them to include only .xml
  • go over files with conn.getfo(file_name, file) to get the file and search for content.

Pseudo code

import pysftp
cnopts = pysftp.CnOpts()
cnopts.hostkeys = None

conn = pysftp.Connection('...', username='...', password='..', port=22, cnopts=cnopts)

all_file_names = []
    for file_attr in conn.listdir_attr(root):
        filepath = f'{root}/{file_attr.filename}'.replace('//', '/')
        if stat.S_ISREG(file_attr.st_mode):
            all_file_names.append(filepath)
        elif stat.S_ISDIR(file_attr.st_mode):
            if skip_path is None or skip_path != filepath:
                all_file_names = all_file_names + get_all_filenames(filepath)

for file_name in [f_name for f_name in all_file_names if f_name.endswith(tuple(['.xml']))]:
    with tempfile.TemporaryFile(mode='wb+') as file:
        try:
            conn.getfo(file_name, file)
            file.seek(0)
            file.read()
            # find right tag...

This code takes around 0.15s per file to open it. Because I have 5000 files this is 12,5min to go over all of them.

Question

How to optimize this?

Prerequisits

  • The SFTP is outside of my domain
  • I do not have SSH permissions
  • There is not a viable solution to download and store all files on my server (file sync)
0

There are 0 answers