Note: table is name of hive table, which is HDFS directory.
I have two servers, C1 and C2. C1 has a table item.name with sequential format. C2 has a table item.name with orc format, which has same data as C1.
Whenever i need to use distcp to copy data from the table item.name from C1 to C2. I need to drop current table in C2 and re-create a table as sequential before i run distcp. Finally, re-create a table as orc format. This is troublesome with the huge data and daily task.
I am wondering with the idea to create 2 tables in C2 as item.name (orc) and item.name_seq (sequential). With this approach, i need to copy data from item.name from C1 to item.name_seq in C2. After copying data, i can insert into orc table of item.name in C2. Is this approach good? is it possible to achieve?
Basically, with distcp i need to copy data from item.name in C1 to item.name_seq in C2.
Please let me know, if you have a better approach.
You could use SparkSQL instead to read/write between Hive servers while changing table column names, and formats.
Otherwise, as you mentioned, you'd need to re-create the table after distcp (as sequencefile format), then run CTAS statement to convert to ORC within Hive itself.