bit vector intersect in handling parquet file format

179 views Asked by At

I am handling parquet file format. For example:

a group of data:

1 2 null 3 4 5 6 null 7 8 null null 9 10 11 12 13 14

I got a bit vector to indicate null element:

1 1 0 1 1 1 1 0 1 1 0 0 1 1 1 1 1 1

and only store the non-null element:

1 2 3 4 5 6 7 8 9 10 11 12 13 14

I want to evaluate a predicate: big then 5

I compared non-null element to 5 and got a bit vector:

0 0 0 0 0 1 1 1 1 1 1 1 1 1

I want to got a bit vector for all elements:

0 0 0 0 0 0 1 0 1 1 0 0 1 1 1 1 1 1

the 0 in bold is null elements, should be false.

void IntersectBitVec(vector<int64_t>& bit_vec, vector<int64_t>& sub_bit_vec) {
int data_idx = 0,
int bit_idx = 63;
for (int i = 0; i < bit_vec.size(); ++i) {
  for (int j = 63; j >=0; --j) {
    if (bit_vec[i] & 0x01 << j) {
      if (!(sub_bit_vec[data_idx] & 0x01 << bit_idx)) {
        bit_vec[i] &= ~(0x01 << j);
      }
      if (--bit_idx < 0) {
        --data_idx;
        bit_idx = 63;
      }
    }
  }
}}

My code is quite ugly, is there anyway to make it fast? Great thanks!

0

There are 0 answers