hive data processing taking longer time than expected

568 views Asked by At

I'm facing an issue with ORC type data in hive. Needed some suggestions if someone faced similar problem.

I've huge data stored in hive table (partitioned & ORCed). The ORC data size is around 4 TB. I'm trying to copy this data to an uncompressed normal hive table (same table structure).

The process is running forever & occupying huge amount of non DFS storage in the pursuit. At present the process is running for 12 hours & has occupied 130 TB of non-DFS. That's very much abnormal for a Hadoop cluster with 20 servers.

Below are my parameters:

Hadoop running: HDP 2.4
Hive: 0.13
No. of servers: 20 (2 NN included)**

I wonder what a simple join or a normal analytics operation on this ORCed table would do. And theory tells that ORC format data increases performance for basic DML queries.

Can someone please let me know if I'm doing something wrong or is this a normal behavior? With ORCed data, this is my first experience.

Well, on a starters I saw that yarn log files are getting created in huge size. Mostly it shows the error logs only in heavy.

Thanks

0

There are 0 answers