How to get the full results of a query to CSV file using AWS/Athena from CLI?

8.8k views Asked by At

I need to download a full table content that I have on my AWS/Glue/Catalog using AWS/Athena. At the moment what I do it is running a select * from my_table from the Dashboard and saving the result locally as CSV always from Dashboard. Is there a way to get the same result using AWS/CLI?

From the documentation I can see https://docs.aws.amazon.com/cli/latest/reference/athena/get-query-results.html but it is not quite what I need.

2

There are 2 answers

3
John Rotenstein On BEST ANSWER

You cannot save results from the AWS CLI, but you can Specify a Query Result Location and Amazon Athena will automatically save a copy of the query results in an Amazon S3 location that you specify.

You could then use the AWS CLI to download that results file.

0
Theo On

You can run an Athena query with AWS CLI using the aws athena start-query-execution API call. You will then need to poll with aws athena get-query-execution until the query is finished. When that is the case the result of that call will also contain the location of the query result on S3, which you can then download with aws s3 cp.

Here's an example script:

#!/usr/bin/env bash

region=us-east-1 # change this to the region you are using
query='SELECT NOW()' # change this to your query
output_location='s3://example/location' # change this to a writable location

query_execution_id=$(aws athena start-query-execution \
  --region "$region" \
  --query-string "$query" \
  --result-configuration "OutputLocation=$output_location" \
  --query QueryExecutionId \
  --output text)

while true; do
  status=$(aws athena get-query-execution \
    --region "$region" \
    --query-execution-id "$query_execution_id" \
    --query QueryExecution.Status.State \
    --output text)
  if [[ $status != 'RUNNING' ]]; then
    break
  else
    sleep 5
  fi
done

if [[ $status = 'SUCCEEDED' ]]; then
  result_location=$(aws athena get-query-execution \
    --region "$region" \
    --query-execution-id "$query_execution_id" \
    --query QueryExecution.ResultConfiguration.OutputLocation \
    --output text)
  exec aws s3 cp "$result_location" -
else
  reason=$(aws athena get-query-execution \
    --region "$region" \
    --query-execution-id "$query_execution_id" \
    --query QueryExecution.Status.StateChangeReason \
    --output text)
  echo "Query $query_execution_id failed: $reason" 1>&2
  exit 1
fi

If your primary work group has an output location, or you want to use a different work group which also has a defined output location you can modify the start-query-execution call accordingly. Otherwise you probably have an S3 bucket called aws-athena-query-results-NNNNNNN-XX-XXXX-N that has been created by Athena at some point and that is used for outputs when you use the UI.