I am trying to set up Presto on an AWS EC2 instance, so that I can run queries on files stored in S3. I know you can/should use EMR, but I am restricted to EC2. I have created a Derby DB for my metastore, set up HDFS on the bucket, and am able to query data files in S3 using Hive. In the Hive CLI, I can run SELECT * FROM testpresto;
(testpresto is my table name) and it correctly displays all the contents of my S3 txt file. I connected Presto to the Hive metastore, so my table displays in the presto CLI via SHOW TABLES;
and DESCRIBE testpresto;
.
However, when I run SELECT * FROM testpresto;
the query times out with the below error.
Query 20170109_165917_00007_7pyam failed: Unable to execute HTTP request: Connect to ${MY_BUCKET}.s3-us-west-1.amazonaws.com:443 [${MY_BUCKET}.s3-us-west-1.amazonaws.com/54.231.237.24] failed: connect timed out
When I try to instantiate a new schema via Presto CLI, I get a more descriptive error.
Query 20170109_175329_00016_7pyam failed: java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively).
Unfortunately, nothing that I do resolves this error. From what I've read online, it seems that adding a resource reference to my Presto config (catalog/hive.properties
) through
hive.config.resources=/usr/local/hadoop/etc/hadoop/core-site.xml,/usr/local/hadoop/etc/hadoop/hdfs-site.xml
or adding the keys directly via hive.s3.aws-access-key
and hive.s3.aws-secret-key
should enable Presto to read from S3. I've also tried using hive.s3.use-instance-credentials=true
(both with and without setting the key configs) to use the IAM role, but it all produces this same error. Is there some other setting that I am missing? I don't understand why Hive is able to query but Presto is not.
When launching a EC2 instance you have the opportunity to assign it a IAM role. The IAM role must be assigned at launch time. Once a instance is launched, you cannot assign it a role or change its role.
I think that you should create a IAM role that has the required access to the s3 bucket, and then launch a new EC2 instance and assign it that role. As soon as this instance is created, SSH in and run
aws s3 ls
to see what buckets the instance has access too. If you configured the role correctly, it should be able to list your bucket/s. From there on Presto should work.