I have a workflow which moves 700gb in files from an ftp server to an on-prem server for python script processing.
i would like to migrate this process to a AWS s3 bucket for lambda to process.
i saw AWS data-sync as a reasonable priced solution (0.0125/gb) to move this data to an S3 bucket.
but not from an ftp site.
anyone have suggestions how to do this?
note: i've looked into filezilla pro but there is no way to automate this process with a batch command or scripting.
AWS Lambda is not a good choice for such job due to dynamic memory requirements and unreliable latency time between your FTP site and Lambda Function.
Looks like you are trying to copy 700GB data into S3 via some AWS service. If this a correct statement, then please do serious cost calculations for the following:
S3 pricing is function of amount data transfer and frequency of retrieval. Reading writing 700GB data will cost significantly per month.
Lambda function execution time and memory. Whenever Lambda will be executed it will read the file into temp memory var. This is where you will get high cost as Lambda function costing depends on amount of memory used.
Second the connection speed between FTP site and Lambda edge server is also worth mentioning, since more the latency more quickly you will exhaust your free 1M lambda request quota.
I would recommend to use Python/Ruby/PHP script either on FTP server or on-premise local machine and upload files to S3 buckets. If you are going with approach then do think about archiving the data into Glacier so that you will save money.
If you need Lambda code please let me know I will be happy to share with you. Hope this will help.