Dynamic partition in hive

Asked by At

I have created a table with dynamic partition in hive as below

create table sample(uuid String,date String,Name String,EmailID String,Comments String,CompanyName String,country String,url String,keyword String,source String)  PARTITIONED BY (id String) Stored as parquet;

Also I have set the following in hive shell

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions=100000000;
set hive.exec.max.dynamic.partitions.pernode=100000000;
set hive.exec.max.created.files = 100000000;

Is this a good practise as I am setting the values 100 million for each dynamic partitions configuration as shown above?

1 Answers

Miguel On

The dynamic partitions are designed to those tables which will have new partition values. If your table will be affected by INSERT clause it is okey, in case you don't have dynamic partition you have to execute another query to create the new ones, or you have to know the value of those before:

FROM page_view_stg pvs
INSERT OVERWRITE TABLE page_view PARTITION(dt='2008-06-08', country='US')
   SELECT pvs.viewTime, pvs.userid, pvs.page_url, pvs.referrer_url, null, null, pvs.ip WHERE pvs.country = 'US'

In the official Hive tutorial you could check an example.

The best practise on partitioning are related with the kind of data stored. For example:

  • It is not recomendable to use unique values like Ids. (If each row will have a different id value, this is a bad practice)
  • The data have to have enough dispersion, if a partition have few different values (like use a boolean field or similar), it is a bad practice.