Hive Update partition vs MSCK Repair

384 views Asked by At

I have a table with thousands of partition. I want to change all the partition location to diff cluster. Ex:
for table test_table and partition day=2021041600
Old location: hdfs://cluster1/dir1/dir2/day=2021041600/\<files>
New location: hdfs://cluster2/dir1/dir2/day=2021041600/\<files>

I can achieve this using 2 ways.

  1. We can fetch the list of all the partitions and update the partition location for every partition 1 by 1.
  2. We can change the base location of the table and run the MSCK repair command on the table.

My question is which option would we better approach to take?

1

There are 1 answers

0
leftjoin On

1st approach will work.

2nd approach (MSCK repair):

MSCK REPAIR will not work if you change table location because partitions are mounted to old locations outside table location.

Make table EXTERNAL, DROP, CREATE with new location, run MSCK REPAIR:

alter table test_table  SET TBLPROPERTIES('EXTERNAL'='TRUE');  

drop table test_table;

create table test_table ... location 'hdfs://cluster2/dir1/dir2';

MSCK REPAIR TABLE test_table;