Amazon Sagemaker Studio Data Wrangler athena query failing for large datasets

Question

Amazon Sagemaker Studio Data Wrangler athena query failing for large datasets

409 views Asked by user2242666 At 15 December 2022 at 20:49

Trying to query a large dataset from Athena using AWS data wrangler. The query fails for large datasets. This is for setting up a datawrangler pipeline using UI in AWS studio trying to add a Athena Source.

Some observations:

Small Athena queries works
Same dataset is successfully read from S3 after querying using Athena.
First I get the warning in UI saying your query takes longer than usual, and then failure message with no specific reason. No useful message in cloudformation logs also
Same query completed directly in Athena in around 30 minutes.

Anyone encountered a similar problem? any timeout settings for data wrangler?

Original Q&A

There are 1 answers

**Luk3rson** · Answer 1 · 2023-03-01T16:29:27+00:00

I had the same issue with the Snowflake as a source. I created a support ticket and according to them they are working to enhance performance on large datasets.

As a workaround export the flow to a SageMaker pipeline and run it as a Processing Job on multiple instances as it runs in a distributed environment using Spark.

TechQA.

Amazon Sagemaker Studio Data Wrangler athena query failing for large datasets

There are 1 answers

Related Questions in AMAZON-SAGEMAKER

Related Questions in AWS-DATA-WRANGLER

Related Questions in AMAZON-SAGEMAKER-STUDIO

Popular Questions

Popular Tags

Trending Questions