Hive(Bigdata)- difference between bucketing and indexing

Question

Hive(Bigdata)- difference between bucketing and indexing

2.1k views Asked by anusngh At 13 June 2015 at 12:02

What is the main difference between bucketing and indexing of a table in Hive?

Original Q&A

There are 1 answers

**dbustosp** · Accepted Answer · 2015-06-13T15:52:17+00:00

The main difference is the goal:

Indexing

The goal of Hive indexing is to improve the speed of query lookup on certain columns of a table. Without an index, queries with predicates like 'WHERE tab1.col1 = 10' load the entire table or partition and process all the rows. But if an index exists for col1, then only a portion of the file needs to be loaded and processed.

Indexes become even more essential when the tables grow extremely large, and as you now undoubtedly know, Hive thrives on large tables.

Bucketing

It is usually used for join operations, because you can optimize joins by bucketing records by a specific 'key' or 'id'. In this way, when you want to do a join operation, records with the same 'key' will be in the same bucket and then the join operation will be faster. You can see this like a technique for decomposing data sets into more manageable parts. This link gives you 5 Tips for efficient Hive queries and one of them is about Bucketing.

TechQA.

Hive(Bigdata)- difference between bucketing and indexing

There are 1 answers

Related Questions in HADOOP

Related Questions in MAPREDUCE

Related Questions in HIVE

Related Questions in BIGDATA

Popular Questions

Trending Questions