Get the latest version of a group of files with similar name using the get statement for download

234 views Asked by At

Currently I am using the following command to download an entire directory from an SFTP server to our own. The problem is, this directory gets larger every day and most of the files in it aren't necessary. So what I do today is that I download the entire folder and then clean the unnecessary ones.

But our customer is not enjoying this solution because it leads to heavy file transfers (which they are paying for).

The current version:

sshpass -p $FTP_PASS sftp -o StrictHostKeyChecking=no -o HostKeyAlgorithms=+ssh-dss [USERNAME]@[SFTP_DOMAIN].com <<EOF

get -r Export
EOF

I'd like to improve on this script so that instead of downloading the entire folder, the script searches for files that start with a specific string and only then get the latest version of them.

E.g.

We are looking for the latest versions of the ones that start with either Subscribers_Extracts or Clicks or Account_Extract and we have the following list in the directory:

Subscribers_Extracts_1.csv 
Subscribers_Extracts_2.csv
Subscribers_Extracts_3.csv
Subscribers_Extracts_4.csv (latest modified)
Clicks_ftyftyf.csv
Clicks_67546754675.csv (latest modified)
Clicks_783635ghgh.csv 
Account_Extract_uguyfuyfuf.csv

Then the files we should download are going be

Subscribers_Extracts_4.csv
Clicks_67546754675.csv
Account_Extract.csv

Note that we picked the files based on the Modified date and not the numbers on their names.

Also note that the last type aka Account_Extract.csv is the only file matching the third pattern so we receive that regardless of its modified date.

How can I save my customer a lot of data transfer?

1

There are 1 answers

3
Irene Marzuoli On

rsync can sync files modified on the last given amount of time, but for something more flexible you can check the last modification date (in seconds) of a file with:

date +%s -r filename

Then implement the check in a loop for each filename root (i.e. checking for every f in Subscribers_Extract*), saving the filename for which the date is higher.

However date -r doesn't work on OS X systems.

EDITED

If you can ssh into the remote server and execute bash script, this gives the name of the latest "Subscribers_..." modified, which you can copy:

#!/bin/bash
list="$( ls Subscribers_Extract* )"
names=( $list )
nr_names=${#names[@]}
date_modify=0
file_to_copy="none"
for i in `seq 0 $(( nr_names - 1 ))` 
do date_tmp=`date +%s -r ${names[$i]}`
if (( $date_tmp>$date_modify ))
then
date_modify=$date_tmp
file_to_copy=${names[$i]}
fi
done
echo $file_to_copy