How to copy files from s3 to s3 same folder?

251 views Asked by At

I am trying to combine log files from s3 to s3 using the following command.

s3-dist-cp --src s3://path/to/ym=2020/ --dest s3://path/to/ym=2020/ --groupBy='.*/(\d{8}).+(\.json) --deleteOnSuccess'

I have the following files in

s3://path/to/ym=2020/20200802010010.json
s3://path/to/ym=2020/20200802010020.json
s3://path/to/ym=2020/20200802010030.json
s3://path/to/ym=2020/20200802010040.json
s3://path/to/ym=2020/20200802010050.json
s3://path/to/ym=2020/20200803010010.json
s3://path/to/ym=2020/20200803010020.json
s3://path/to/ym=2020/20200803010030.json
s3://path/to/ym=2020/20200803010040.json
s3://path/to/ym=2020/20200803010050.json

expected result is

s3://path/to/ym=2020/20200802.json
s3://path/to/ym=2020/20200803.json

but I get the following errors

20/08/04 03:29:14 INFO s3distcp.S3DistCp: Created 1 files to copy 856 files
Exception in thread "main" java.lang.NullPointerException
    at com.amazon.ws.emr.hadoop.fs.s3n.S3NativeFileSystem.mkdirs(S3NativeFileSystem.java:1052)
    at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1961)
    at com.amazon.ws.emr.hadoop.fs.EmrFileSystem.mkdirs(EmrFileSystem.java:443)
    at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:893)
    at com.amazon.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:728)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
    at com.amazon.elasticmapreduce.s3distcp.Main.main(Main.java:22)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:153)

The following command is working.(change dest path to other path)

s3-dist-cp --src s3://path/to/ym=2020/ --dest s3://path/to/archives/ym=2020/ --groupBy='.*/(\d{8}).+(\.json) --deleteOnSuccess'
0

There are 0 answers