TPC-DS Query 6: Why do we need 'where j.i_category = i.i_category' condition?

139 views Asked by At

I'm going through TPC-DS for Amazon Athena.

It was fine until query 5.

I got some problem on query 6. (which is below)

select  a.ca_state state, count(*) cnt
 from customer_address a
     ,customer c
     ,store_sales s
     ,date_dim d
     ,item i
 where       a.ca_address_sk = c.c_current_addr_sk
    and c.c_customer_sk = s.ss_customer_sk
    and s.ss_sold_date_sk = d.d_date_sk
    and s.ss_item_sk = i.i_item_sk
    and d.d_month_seq = 
         (select distinct (d_month_seq)
          from date_dim
               where d_year = 2002
            and d_moy = 3 )
    and i.i_current_price > 1.2 * 
             (select avg(j.i_current_price) 
         from item j 
         where j.i_category = i.i_category)
 group by a.ca_state
 having count(*) >= 10
 order by cnt, a.ca_state 
 limit 100;

It took more than 30 minutes so it failed with timeout.

I tried to find which part cause problem, so I checked the where conditions and I found where j.i_category = i.i_category for the last part of where condition.

I don't know why this condition is needed so I deleted this part and the query ran Ok.

can you guys tell me why this part is needed?

1

There are 1 answers

0
Piotr Findeisen On

The j.i_category = i.i_category is subquery correlation condition. If you remove it from the subquery

select avg(j.i_current_price) 
from item j 
where j.i_category = i.i_category)

the subquery becomes uncorrelated, and becomes a global aggregation on the item table, which is easy to calculate and the query engine needs to do it once.

If you want a fast, performant query engine on AWS, i can recommend Starburst Presto (disclaimer: i am from Starburst). See https://www.concurrencylabs.com/blog/starburst-presto-vs-aws-redshift/ for a related comparison (note: this is not a comparison with Athena).

If it doesn't have to be that fast, you can use PrestoSQL on EMR (note that "PrestoSQL" and "Presto" components on EMR are not the same thing).