I'm encountering difficulties when attempting to retrieve a list of files from an SFTP directory in Python (pysftp package). The error is observed at the following code line:
file_list = sftp.listdir_attr(remote_directory)
The error message:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 17: invalid start byte
I've looked at the file naming convention in the directory, and although nothing appears to be unique, we do use characters like [ ] ( ) $ @ ! - _ + = etc. We would like to keep the filename in the SFTP as is (uploaded by the client; need to keep the same for their mapping).
I tried to use paramiko, which gave me the same results. I also have try to change the code as follows:
file_list = sftp.listdir_attr(remote_directory.encode('utf+8', errors='ignore').decode())
The problem persists despite exploring potential solutions, such as ignoring the directory entirely (which is not feasible for us) or attempting to decode the filenames after the specified line (which isn't possible due to the error occurring precisely at that line). Are there any alternative approaches or suggestions to address this UnicodeDecodeError in our situation?
Additionally, the SFTP is situated behind a firewall that requires whitelisting of IP addresses before access is granted. However, I don't think this is the issue, as I'm able to access some other directories on the SFTP server. Also, I able to access the said directories through other tools (cyberduck).
I would appreciate any help. Thank you!
A member of
remote_directoryhas a filename which is a byte string which is invalid UTF-8, and thus you get a UnicodeDecodeError from your SFTP client when it tries to decode it as UTF-8. The solution is to use a different encoding for which the byte string is valid.As a quick fix, configure your SFTP client to use the ISO-8859-1 encoding for filenames instead of the default UTF-8. This will make your program succeed. (By taking a quick look at pysftp documentation, it supports only UTF-8, so you need to use something else.)
You may want to change the encoding like this:
(
paramiko.util.uis used byparamiko.message.Message.get_text, used byparamiko.sftp_client.SFTPClient.listdir_attr, used by.listdir_attrin pysftp.)You may need a different encoding to get the non-ASCII characters in the filenames right, and you have to guess which encoding it is. Ask a separate question if you need help guessing.