how to make max function in hive query to ignore _HIVE_DEFAULT_PARTITION__

562 views Asked by At

I have a view which uses max to show the latest partition (which is of format 2021-01, 2021-02, 2021-03, 2021-04). The hive table has _HIVE_DEFAULT_PARTITION__ too.

When we run the query in Impala, max on partitions gives the correct value of 2021-04 ignoring _HIVE_DEFAULT_PARTITION__ but the same do not work when we run the query in Hive as it returns _HIVE_DEFAULT_PARTITION__

Is there any way to make Hive query ignore the default partition if exists while returning max on that column?

1

There are 1 answers

0
leftjoin On

You can filter it:

select max(partition_col) from your_table where partition_col != "__HIVE_DEFAULT_PARTITION__"

If you do not need data in __HIVE_DEFAULT_PARTITION__, you can drop it:

ALTER TABLE your_table  DROP PARTITION (partition_col='__HIVE_DEFAULT_PARTITION__');

Transforming __HIVE_DEFAULT_PARTITION__ to NULL can be a solution if with max(partition_col) you want to aggregate something else and do not want to excluse __HIVE_DEFAULT_PARTITION__ partition:

select max(case when partition_col = "__HIVE_DEFAULT_PARTITION__" then NULL else partition_col end) as max_partition_col,
       --aggregate something else including HIVE_DEFAULT_PARTITION
  from your_table