i am bit new to hadoop. As per my knowledge buckets are fixed no. of partitions in hive table and hive uses the no. of reducers same as the total no. of buckets defined while creating the table. So can anyone tell me how to calculate the total no. of buckets in a hive table. Is there any formula for calculating the total number of buckets ?
How can we decide the total no. of buckets for a hive table
23.4k views Asked by Biswa Bandana Nayak At
5
There are 5 answers
0
On
Lets take a scenario Where table size is: 2300 MB, HDFS Block Size: 128 MB
Now, Divide 2300/128=17.96
Now, remember number of bucket will always be in the power of 2.
So we need to find n such that 2^n > 17.96
n=5
So, I am going to use number of buckets as 2^5=32
Hope, It will help some of you.
4
On
If you want to know how many buckets you should choose in your CLUSTER BY
clause, I believe it is good to choose a number that results in buckets that are at or just below your HDFS block size.
This should help avoid having HDFS allocate memory to files that are mostly empty.
Also choose a number that is a power of two.
You can check your HDFS block size with:
hdfs getconf -confKey dfs.blocksize
From the documentation link