I am handling parquet file format. For example:
a group of data:
1 2 null 3 4 5 6 null 7 8 null null 9 10 11 12 13 14
I got a bit vector to indicate null element:
1 1 0 1 1 1 1 0 1 1 0 0 1 1 1 1 1 1
and only store the non-null element:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
I want to evaluate a predicate: big then 5
I compared non-null element to 5 and got a bit vector:
0 0 0 0 0 1 1 1 1 1 1 1 1 1
I want to got a bit vector for all elements:
0 0 0 0 0 0 1 0 1 1 0 0 1 1 1 1 1 1
the 0 in bold is null elements, should be false.
void IntersectBitVec(vector<int64_t>& bit_vec, vector<int64_t>& sub_bit_vec) {
int data_idx = 0,
int bit_idx = 63;
for (int i = 0; i < bit_vec.size(); ++i) {
for (int j = 63; j >=0; --j) {
if (bit_vec[i] & 0x01 << j) {
if (!(sub_bit_vec[data_idx] & 0x01 << bit_idx)) {
bit_vec[i] &= ~(0x01 << j);
}
if (--bit_idx < 0) {
--data_idx;
bit_idx = 63;
}
}
}
}}
My code is quite ugly, is there anyway to make it fast? Great thanks!