Is the compressed size of all bitmap indices for a particular column at most proportional to the size of your table?

125 views Asked by At

I was reading Daniel Lemire's post The Mythical Bitmap Index (http://lemire.me/blog/archives/2008/08/20/the-mythical-bitmap-index/) and in the post he says

the compressed size of a bitmap index is at most proportional to the size of your table! Irrespective of the number of distinct values!

I struggle to see how he calculated this value.

I know that the worst case space usage for a Run-Length-Encoded text of length N is proportional to N (2N?) so O(N).

I also know that the worst case for the number of bitmap indices for a particular column is when the cardinality of the column is N where N is the number of records in the table (so that every record has a unique value in that particular column). This means there will be N bitmap indices.

However, under the worst case assumption for bitmap indices, each bitmap index, when run-length-encoded, will have constant space usage because it'd just be some zeroes, 1, followed by some zeroes, so O(1).

Therefore the total space usage of all the bitmap indices under the highest cardinality N is just N x O(1) = O(N).

However, how do you go from this particular calculation to the worst case for all possible cases? It's not clear to me that the case I described, where cardinality = N, is the worst case space usage for all bitmap indices added together.

How would you calculate the worst case space usage of all run-length-encoded bitmap indices added together for a column in a table?

1

There are 1 answers

0
Danylo Mysak On BEST ANSWER

By the nature of the bitmap index the number of 1s in the whole matrix will not exceed N (and will equal N if columns for all the values are in place). The compressed size of a column with N[i] 1s will be O(N[i]) (with the worst case being 1s and 0s alternate). So, the total size of the compressed columns will not exceed O(sum(N[i])) <= O(N).