Can we make a column having both partitioning and bucketing in hive?

14 views Asked by At

Can we make a column having both partitioning and bucketing in hive table?

im confused how we can use it

I have data in my hive table which was partitioned by date. As one day data is huge, I want to further divide this data into 4 parts. so that I want to read each part and process the data.

1

There are 1 answers

0
Koushik Roy On

Bucketing and Partitioning complement each other. So, if you use them together(which i do not think you can do), it will not distribute the way you want to and it will distribute the data even when the group is small. Now, if you see data on some days are huge, try bucketing on date column first and then partition on columns like geography/age/role or some other categorical variable with lesser granularity.
You can try this as well -

CREATE TABLE mytable(
RecordNumber int, 
City string,
Zipcode int,
date_entered date
)
PARTITIONED BY(country STRING)
CLUSTERED BY(date_entered) INTO 300 BUCKETS;