Pyathena is super slow compared to querying from Athena

5.9k views Asked by At

I run a query from AWS Athena console and takes 10s. The same query run from Sagemaker using PyAthena takes 155s. Is PyAthena slowing it down or is the data transfer from Athena to sagemaker so time consuming?

What could I do to speed this up?

1

There are 1 answers

3
room13 On BEST ANSWER

Just figure out a way of boosting the queries:

Before I was trying:

import pandas as pd
from pyathena import connect

conn = connect(s3_staging_dir=STAGIN_DIR,
             region_name=REGION)
pd.read_sql(QUERY, conn)
# takes 160s

Figured out that using a PandasCursor instead of a connection is way faster

import pandas as pd
pyathena import connect
from pyathena.pandas.cursor import PandasCursor

cursor = connect(s3_staging_dir=STAGIN_DIR,
                 region_name=REGION,
                 cursor_class=PandasCursor).cursor()
df = cursor.execute(QUERY).as_pandas()
# takes 12s

Ref: https://github.com/laughingman7743/PyAthena/issues/46