I have an issue related to a similar question asked before. I'm unable to copy data from HDFS to an S3 bucket in IBM Cloud.
I use command: hadoop distcp hdfs://namenode:9000/user/root/data/ s3a://hdfs-backup/
I've added extra properties in /etc/hadoop/core-site.xml
file:
<property>
<name>fs.s3a.access.key</name>
<value>XXX</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>XXX</value>
</property>
<property>
<name>fs.s3a.endpoint</name>
<value>s3.eu-de.cloud-object-storage.appdomain.cloud</value>
</property>
<property>
<name>fs.s3a.multipart.size</name>
<value>104857600</value>
</property>
I receive following error message:
root@e05ffff9bac9:/etc/hadoop# hadoop distcp hdfs://namenode:9000/user/root/data/ s3a://hdfs-backup/
2021-04-29 13:29:36,723 ERROR tools.DistCp: Invalid arguments:
java.lang.IllegalArgumentException
at java.util.concurrent.ThreadPoolExecutor.<init>(ThreadPoolExecutor.java:1314)
at java.util.concurrent.ThreadPoolExecutor.<init>(ThreadPoolExecutor.java:1237)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:280)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3303)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:124)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3352)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3320)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:479)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
at org.apache.hadoop.tools.DistCp.setTargetPathExists(DistCp.java:240)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:441)
Invalid arguments: null
Connection to S3 bucket with AWS CLI works fine. Thanks in advance for help!