I need to copy large number of small files from one S3 bucket to another. I'm using S3-Dist-Cp command provided by AWS.
s3-dist-cp --src=s3://some-bucket/ --dest=s3://another-bucket/ --groupBy=<some-pattern> --targetSize=<size> --deleteOnSuccess
Now, the problem with this command is that it takes forever to copy all small files and merge them.
Note - Source bucket is being written continuously with new files by some other job and I think s3-dist-cp never catches with last file.
Is there any workaround for this solution? destination bucket will be used by Spark job to process these files.