I have a table with thousands of partition. I want to change all the partition location to diff cluster.
Ex:
for table test_table and partition day=2021041600
Old location: hdfs://cluster1/dir1/dir2/day=2021041600/\<files>
New location: hdfs://cluster2/dir1/dir2/day=2021041600/\<files>
I can achieve this using 2 ways.
- We can fetch the list of all the partitions and update the partition location for every partition 1 by 1.
- We can change the base location of the table and run the MSCK repair command on the table.
My question is which option would we better approach to take?
1st approach will work.
2nd approach (MSCK repair):
MSCK REPAIR
will not work if you change table location because partitions are mounted to old locations outside table location.Make table EXTERNAL, DROP, CREATE with new location, run MSCK REPAIR: