AWS file upload

972 views Asked by At

I want to upload few files into AWS bucket from hadoop. I have AWS ACCESS KEY, SECRET KEY and S3 IMPORT PATH.

I am not able to access though AWS CLI command. I set the keys in aws credential file. I tried to do “ aws s3 ls” I am getting error as

An error occurred (InvalidToken) when calling the ListBuckets operation: The provided token is malformed or otherwise invalid.

Since the above code didn’t work, I tried using distcp command as below.

hadoop distcp -Dmapreduce.job.queuename=root.mr.sbg.sla -Dfs.s3a.proxy.host=qypprdproxy02.ie.xxx.net  -Dfs.s3a.proxy.port=80  -Dfs.s3a.endpoint=s3.us-west-2.amazonaws.com -Dfs.s3a.aws.credentials.provider="org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider" -Dfs.s3a.access.key="AXXXXXXXXXXQ" -Dfs.s3a.secret.key="4I9nXXXXXXXXXXXXHA" -Dfs.s3a.session.token="FQoDYXdzECkaDNBtHNfS5sKxXqNdMyKeAuqLbVXG72KvcPmUtnpLGbM7UE59zjvNNo0u8mWlslCEvZcZLxXw1agAInzGH8vnGleqxjzuBBgXMXXXXXXXG0zpHA8eyrwCZqUBXSg9cdqevv1sFT8lUIEi5uTGLjHXgkQoBXXXXXXXXXXXXXXt80Rp4vb3P7k5N2AVZmuVvM/SEH/qMLiFabDbVliGXqw7MHXTXXXXXXXXXXXXXXXtW8JvmOFPR3nGdQ4VKzw0deSbNmL/BCivfh9pf7ubm5RFRSLxqcdoT7XAXIWf1jJguEGygcBkFRh2Ztvr8OYcG78hLEJX61ssbKWXokOKTBMnUxx4b0jIG1isXerDaO6RRVJdBrTXn2Somzigo4ZbL0wU=" TXXXX/Data/LiXXXXL/HS/ABC/part-1517397360173-r-00000 s3a://data-import-dev/1012018.csv

for the above command also I getting below error.

18/11/09 00:55:40 INFO http.AmazonHttpClient: Configuring Proxy. Proxy Host: qypprdproxy02.ie.XXXX.net Proxy Port: 80 18/11/09 00:55:40 WARN s3a.S3AFileSystem: Client: Amazon S3 error 400: 400 Bad Request; Bad Request (retryable)

com.cloudera.com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: 121931CAB75C3BB0), S3 Extended Request ID: jn/iTngZS83+A5U8e2gjQsyArDC68E+r0q/Sll0gkSCn0h5yDaG17TEb9HNSx7o590hmofguJIg= at com.cloudera.com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1182) at com.cloudera.com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:770) at com.cloudera.com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:489) at com.cloudera.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:310) at com.cloudera.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3785) at com.cloudera.com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1107) at com.cloudera.com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:1070) at org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:312) at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:260) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2815) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:98) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2852) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2834) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:387) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.tools.DistCp.setTargetPathExists(DistCp.java:205) at org.apache.hadoop.tools.DistCp.run(DistCp.java:131) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.tools.DistCp.main(DistCp.java:441) 18/11/09 00:55:40 ERROR tools.DistCp: Invalid arguments: org.apache.hadoop.fs.s3a.AWSS3IOException: doesBucketExist on segmentor-data-import-dev: com.cloudera.com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: 121931CAB75C3BB0), S3 Extended Request ID: jn/iTngZS83+A5U8e2gjQsyArDC68E+r0q/Sll0gkSCn0h5yDaG17TEb9HNSx7o590hmofguJIg=: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: 121931CAB75C3BB0) at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:178) at org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:318) at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:260) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2815) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:98) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2852) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2834) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:387) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296) at org.apache.hadoop.tools.DistCp.setTargetPathExists(DistCp.java:205) at org.apache.hadoop.tools.DistCp.run(DistCp.java:131) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.tools.DistCp.main(DistCp.java:441) Caused by: com.cloudera.com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: 121931CAB75C3BB0), S3 Extended Request ID: jn/iTngZS83+A5U8e2gjQsyArDC68E+r0q/Sll0gkSCn0h5yDaG17TEb9HNSx7o590hmofguJIg= at com.cloudera.com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1182) at com.cloudera.com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:770) at com.cloudera.com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:489) at com.cloudera.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:310) at com.cloudera.com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3785) at com.cloudera.com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1107) at com.cloudera.com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:1070) at org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:312) ... 11 more Invalid arguments: doesBucketExist on segmentor-data-import-dev: com.cloudera.com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: 121931CAB75C3BB0), S3 Extended Request ID: jn/iTngZS83+A5U8e2gjQsyArDC68E+r0q/Sll0gkSCn0h5yDaG17TEb9HNSx7o590hmofguJIg=: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: 121931CAB75C3BB0) usage: distcp OPTIONS [source_path...] OPTIONS -append Reuse existing data in target files and append new data to them if possible -async Should distcp execution be blocking -atomic Commit all changes or none -bandwidth Specify bandwidth per map in MB -delete
Delete from target, files missing in source -diff
Use snapshot diff report to identify the difference between source and target -f List of files that need to be copied -filelimit (Deprecated!) Limit number of files copied to <= n -filters The path to a file containing a list of strings for paths to be excluded from the copy. -i Ignore failures during copy -log Folder on DFS where distcp execution logs are saved -m Max number of concurrent maps to use for copy -mapredSslConf Configuration for ssl config file, to use with hftps://. Must be in the classpath. -numListstatusThreads Number of threads to use for building file listing (max 40). -overwrite Choose to overwrite target files unconditionally, even if they exist. -p preserve status (rbugpcaxt)(replication, block-size, user, group, permission, checksum-type, ACL, XATTR, timestamps). If -p is specified with no , then preserves replication, block size, user, group, permission, checksum type and timestamps. raw.* xattrs are preserved when both the source and destination paths are in the /.reserved/raw hierarchy (HDFS only). raw.* xattrpreservation is independent of the -p flag. Refer to the DistCp documentation for more details. -rdiff Use target snapshot diff report to identify changes made on target -sizelimit (Deprecated!) Limit number of files copied to <= n bytes -skipcrccheck Whether to skip CRC checks between source and target paths. -strategy Copy strategy to use. Default is dividing work based on file sizes -tmp Intermediate work path to be used for atomic commit -update Update target, copying only missingfiles or directories

Please let me know on how to achieve this.

1

There are 1 answers

0
Yossi Cohen On BEST ANSWER

I encountered the same problem. This issue may arise when files inside ~.aws are modified manually and not via the "aws configure" command.

Did you try to:

  1. Delete the "config" and "credentials" files (located at ~.aws)
  2. run the "aws configure" command (recreating the files you deleted in #1)

That has fixed the problem for me.

This is mainly because I use other tools that also modify these files.

I hope it helps.