aws read data from athena error using aws wrangler

7.3k views Asked by At

I am using python3

I am trying to read data from aws athena using awswrangler package.

Below is the code

import boto3
import awswrangler as wr
import pandas as pd

df_dynamic=wr.athena.read_sql_query("select * from test",database="tst")

Error:

    Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/site-packages/awswrangler/_config.py", line 361, in wrapper

       File "/usr/local/lib/python3.6/site-packages/botocore/regions.py", line 148, in _ 
   endpoint_for_partition
     raise NoRegionError()
      botocore.exceptions.NoRegionError: You must specify a region.

I am not sure to specify and where in order for sql query to work

1

There are 1 answers

2
Andre.IDK On BEST ANSWER

All interactions with the AWS APIs (including via SDK, like boto3) require credentials, you can find more info on how boto3 manages credentials here.

Since you're running this on an EC2 instance, best practices recommend to manage credentials via Instance Profile. Assuming that you have already assigned an IAM Role to the EC2 instance, all you need to do is to specify a region for your code. You can find info on how to assign an IAM Role to your EC2 on the official AWS documentation.

AWS Data Wrangler relies on boto3 and allows to specify a region like so:

boto3.setup_default_session(region_name="us-east-2")

Source: AWS Data Wrangler - Sessions

You can either hardcode the region like in the example above or you can retrieve the region in which the EC2 is deployed using the instance metadata endpoint.

The following endpoint:

curl http://169.254.169.254/latest/dynamic/instance-identity/document

Will return a json that contains, among other info, the region of the EC2:

{
  "privateIp" : "172.31.2.15",
  "instanceId" : "i-12341ee8",
  "billingProducts" : null,
  "instanceType" : "t2.small",
  "accountId" : "1234567890",
  "pendingTime" : "2015-11-03T03:09:54Z",
  "imageId" : "ami-383c1956",
  "kernelId" : null,
  "ramdiskId" : null,
  "architecture" : "x86_64",
  "region" : "ap-northeast-1", # <- region
  "version" : "2010-08-31",
  "availabilityZone" : "ap-northeast-1c",
  "devpayProductCodes" : null
}

You can easily implement this request in Python or by other means if needed.