How many number of reducers will be running if I use distinct on all columns in hive?

69 views Asked by At

I am running a hive query where distinct is applied on all the selected columns, and I noticed that there are a couple of reducers being initiated for running the same query. can anyone explain me the reason behind it?

 Example query:

> SELECT distinct
       seg.col1,
       seg.col2,
       seg.col3
FROM user.ag_user seg
Where ‘2018-05-06’ between start_date and end_date limit 5;
1

There are 1 answers

0
shaine On

It can be a little more complicated than x query leads to y reducers

This answer covers the default case in a little more detail (better than I could)

Default number of reducers