When is the bloom filter created on a Hive table?

4k views Asked by At

I created a hive table with bloom filters on 4 different columns and decided later to add a few more using the alter command.

But I am not sure how to refresh/regenerate the bloom filter on Hive.

Is the bloom filter created during insertion of data?

Is it created when we gather stats? Column or table level?

Or am I completely off on my understanding of bloom filters and it is created on the fly?

I have read the documentation and havent found more information about this. Tried going through the code with no luck and finding where the methods are triggered.

2

There are 2 answers

4
sandeep rawat On BEST ANSWER

You can do this Hive 0.10.0 and later using ANALYZE TABLE command update STATISTICS .

ie.

ANALYZE TABLE Table1 COMPUTE STATISTICS FOR COLUMNS;

Note: bloom filter are create while inserting data

0
fjolt On

Is the bloom filter created during insertion of data?

Yes. When we insert rows into the table, the bloom filter and the index data in the orc file is created stripe by stripe. For query efficiency, it is recommended to sort the correspond columns before insert the data.

Is it created when we gather stats? Column or table level?

No. If new columns added to the bloom filter list, the table data should be reinserted.