Error with Data Pipeline backup when I transfer my data from DynamoDb to S3

1.3k views Asked by At

I have to backup my DynamoDb table into S3 but when i launch this service I receive this error after three attempts:

private.com.amazonaws.AmazonServiceException: User: arn:aws:sts::769870455028:assumed-role/DataPipelineDefaultResourceRole/i-3678d99c is not authorized to perform: elasticmapreduce:ModifyInstanceGroups (Service: AmazonElasticMapReduce; Status Code: 400; Error Code: AccessDeniedException; Request ID: 9065ea77-0f95-11e5-8f35-39a70915a1ef) at private.com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1077) at private.com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:725) at private.com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:460) at private.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:295) at private.com.amazonaws.services.elasticmapreduce.AmazonElasticMapReduceClient.invoke(AmazonElasticMapReduceClient.java:1391) at private.com.amazonaws.services.elasticmapreduce.AmazonElasticMapReduceClient.modifyInstanceGroups(AmazonElasticMapReduceClient.java:785) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at private.com.amazonaws.services.datapipeline.retrier.RetryProxy.invokeInternal(RetryProxy.java:36) at private.com.amazonaws.services.datapipeline.retrier.RetryProxy.invoke(RetryProxy.java:48) at com.sun.proxy.$Proxy33.modifyInstanceGroups(Unknown Source) at amazonaws.datapipeline.cluster.EmrUtil.acquireCoreNodes(EmrUtil.java:325) at amazonaws.datapipeline.activity.AbstractClusterActivity.resizeIfRequired(AbstractClusterActivity.java:47) at amazonaws.datapipeline.activity.AbstractHiveActivity.runActivity(AbstractHiveActivity.java:113) at amazonaws.datapipeline.objects.AbstractActivity.run(AbstractActivity.java:16) at amazonaws.datapipeline.taskrunner.TaskPoller.executeRemoteRunner(TaskPoller.java:132) at amazonaws.datapipeline.taskrunner.TaskPoller.executeTask(TaskPoller.java:101) at amazonaws.datapipeline.taskrunner.TaskPoller$1.run(TaskPoller.java:77) at private.com.amazonaws.services.datapipeline.poller.PollWorker.executeWork(PollWorker.java:76) at private.com.amazonaws.services.datapipeline.poller.PollWorker.run(PollWorker.java:53) at java.lang.Thread.run(Thread.java:745)

How can I do my backup? Does someone have this error? thanks

edit: new policy

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:", "dynamodb:", "ec2:Describe*", "elasticmapreduce:Describe*", "elasticmapreduce:ListInstance*", "elasticmapreduce:AddJobFlowSteps", "elasticmapreduce:", "rds:Describe", "datapipeline:", "cloudwatch:", "redshift:DescribeClusters", "redshift:DescribeClusterSecurityGroups", "sdb:", "sns:", "sqs:" ], "Resource": [ "" ] } ]

This is the new exception :

Error during job, obtaining debugging information... Examining task ID: task_1434014832347_0001_m_000008 (and more) from job job_1434014832347_0001 Examining task ID: task_1434014832347_0001_m_000013 (and more) from job job_1434014832347_0001 Examining task ID: task_1434014832347_0001_m_000005 (and more) from job job_1434014832347_0001 Examining task ID: task_1434014832347_0001_m_000034 (and more) from job job_1434014832347_0001 Examining task ID: task_1434014832347_0001_m_000044 (and more) from job job_1434014832347_0001 Examining task ID: task_1434014832347_0001_m_000004 (and more) from job job_1434014832347_0001 Task with the most failures(4): ----- Task ID: task_1434014832347_0001_m_000002 URL: http://ip-10-37-138-149.eu-west-1.compute.internal:9026/taskdetails.jsp?jobid=job_1434014832347_0001&tipid=task_1434014832347_0001_m_000002 ----- Diagnostic Messages for this Task: Error: Java heap space FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask MapReduce Jobs

1

There are 1 answers

1
AravindR On

Datapipeline agent (TaskRunner) running on your EMR cluster is trying to resize the EMR cluster and it is failing. Your resource role that you passed to EMR cluster does not have permissions to invoke the following api AmazonElasticMapReduce::modifyInstanceGroups.

I just looked at the DefaultResourceRolePolicy, which is created using a wizard in console, (http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-iam-roles.html ) These are the allowed policies for emr: "elasticmapreduce:Describe*", "elasticmapreduce:ListInstance*", "elasticmapreduce:AddJobFlowSteps"

and i found that it does not allow ModifyInstanceGroups.
Please update your resource role policy to allow that. E.g.,"elasticmapreduce:*"

Thx for reporting this bug. In the meanwhile, we will work on fixing the console wizard generated default resource role policy.

Aravind R.