Presto fails to recognize AWS credentials (both IAM and keys)?

2.9k views Asked by At

I am trying to set up Presto on an AWS EC2 instance, so that I can run queries on files stored in S3. I know you can/should use EMR, but I am restricted to EC2. I have created a Derby DB for my metastore, set up HDFS on the bucket, and am able to query data files in S3 using Hive. In the Hive CLI, I can run SELECT * FROM testpresto; (testpresto is my table name) and it correctly displays all the contents of my S3 txt file. I connected Presto to the Hive metastore, so my table displays in the presto CLI via SHOW TABLES; and DESCRIBE testpresto;.

However, when I run SELECT * FROM testpresto; the query times out with the below error.

Query 20170109_165917_00007_7pyam failed: Unable to execute HTTP request: Connect to ${MY_BUCKET}.s3-us-west-1.amazonaws.com:443 [${MY_BUCKET}.s3-us-west-1.amazonaws.com/54.231.237.24] failed: connect timed out

When I try to instantiate a new schema via Presto CLI, I get a more descriptive error.

Query 20170109_175329_00016_7pyam failed: java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively).

Unfortunately, nothing that I do resolves this error. From what I've read online, it seems that adding a resource reference to my Presto config (catalog/hive.properties) through

hive.config.resources=/usr/local/hadoop/etc/hadoop/core-site.xml,/usr/local/hadoop/etc/hadoop/hdfs-site.xml

or adding the keys directly via hive.s3.aws-access-key and hive.s3.aws-secret-key should enable Presto to read from S3. I've also tried using hive.s3.use-instance-credentials=true (both with and without setting the key configs) to use the IAM role, but it all produces this same error. Is there some other setting that I am missing? I don't understand why Hive is able to query but Presto is not.

3

There are 3 answers

7
AudioBubble On

When launching a EC2 instance you have the opportunity to assign it a IAM role. The IAM role must be assigned at launch time. Once a instance is launched, you cannot assign it a role or change its role.

I think that you should create a IAM role that has the required access to the s3 bucket, and then launch a new EC2 instance and assign it that role. As soon as this instance is created, SSH in and run aws s3 ls to see what buckets the instance has access too. If you configured the role correctly, it should be able to list your bucket/s. From there on Presto should work.

0
stevel On

If you are working with EMR, you are using the Amazon S3 client, not ASF code. Which means that code in S3a (e.g. that credential provider chain) isn't going to be there. Ignore any refs to HADOOP-* JIRAs or docs under http://hadoop.apache.org. Sorry

0
Piotr Findeisen On

I tested Presto with S3 and Hive metastore using s3a:// schema for accessing S3. Since you have Hive talking to S3, the rest should be easy:

  • you can assign an IAM Role that allows your EC2 instance to talk to S3. In Presto 0.157 this works out-of-the-box since hive.s3.use-instance-credentials config property of the hive connector defaults to true
  • alternatively you can set the following in the configuration file of the hive connector (usually catalog/hive.properties):
    • hive.s3.use-instance-credentials = false
    • hive.s3.aws-access-key = ...
    • hive.s3.aws-secret-key = ...

Since I understand you tested these options and they didn't work for you, you might still try:

  • re-try
  • try with s3a schema, if you haven't already
  • upgrade if you're well behind latest release, or try exact same version that worked for me (0.157.1-t.1)
  • make sure there is no network-level configuration blocking S3 access from Presto machine
  • make sure IAM Roles really grant you S3 access (or use explicit keys as a temporary workaround)