hive data processing taking longer time than expected

567 views Asked by knowone At 26 December 2016 at 09:53

I'm facing an issue with ORC type data in hive. Needed some suggestions if someone faced similar problem.

I've huge data stored in hive table (partitioned & ORCed). The ORC data size is around 4 TB. I'm trying to copy this data to an uncompressed normal hive table (same table structure).

The process is running forever & occupying huge amount of non DFS storage in the pursuit. At present the process is running for 12 hours & has occupied 130 TB of non-DFS. That's very much abnormal for a Hadoop cluster with 20 servers.

Below are my parameters:

Hadoop running: HDP 2.4
Hive: 0.13
No. of servers: 20 (2 NN included)**

I wonder what a simple join or a normal analytics operation on this ORCed table would do. And theory tells that ORC format data increases performance for basic DML queries.

Can someone please let me know if I'm doing something wrong or is this a normal behavior? With ORCed data, this is my first experience.

Well, on a starters I saw that yarn log files are getting created in huge size. Mostly it shows the error logs only in heavy.

Thanks

Original Q&A

TechQA.

hive data processing taking longer time than expected

There are 0 answers

Related Questions in HADOOP

Related Questions in HIVE

Related Questions in HADOOP-YARN

Related Questions in ORC

Popular Questions

Popular Tags

Trending Questions