Counting number of '1' values in each bit position in Redshift column

585 views Asked by At

I have BIGINT column in my Redshift table, and I want a query that will:

  1. Count how many times the value '1' appears in each bit position across the binary value in all the rows of this column
  2. Will show it in a way that I'll be able to take the x top bits_positions.

For example (I'm already writing the integer values as binary to simplify the example):

column
--------
11011110  = 222
00000000  = 0
11111100  = 252
00011000  = 24
11111100  = 252
00011000  = 24
11000010  = 194

76543210 <- bit_position

will return a table like:

bit_position   count
0              0
1              2
2              3
3              5
4              5
5              2
6              4
7              4

In this case I'll be able to get the top five bit_position: (3,4,6,7,2)

Note: I'll might have up to 64 bit_positions for a column.

1

There are 1 answers

2
Joe Harris On

You can use a bit-wise AND & to check for each position.

Here's an example going across rows:

SELECT SUM(CASE WHEN bit_col & 64 > 0 THEN 1 ELSE 0 END) "1000000"
     , SUM(CASE WHEN bit_col & 32 > 0 THEN 1 ELSE 0 END) "0100000"
     , SUM(CASE WHEN bit_col & 16 > 0 THEN 1 ELSE 0 END) "0010000"
     , SUM(CASE WHEN bit_col & 8 > 0 THEN 1 ELSE 0 END)  "0001000"
     , SUM(CASE WHEN bit_col & 4 > 0 THEN 1 ELSE 0 END)  "0000100"
     , SUM(CASE WHEN bit_col & 2 > 0 THEN 1 ELSE 0 END)  "0000010"
     , SUM(CASE WHEN bit_col & 1 > 0 THEN 1 ELSE 0 END)  "0000001"
FROM my_table
;
 1000000 | 0100000 | 0010000 | 0001000 | 0000100 | 0000010 | 0000001
---------+---------+---------+---------+---------+---------+---------
      11 |       8 |      11 |      13 |      11 |       9 |       8

To have the results in a single column you need to use union:

          SELECT 1 AS "col", SUM(CASE WHEN bit_col & 64 > 0 THEN 1 ELSE 0 END) AS bit_count FROM my_table
UNION ALL SELECT 2 AS "col", SUM(CASE WHEN bit_col & 32 > 0 THEN 1 ELSE 0 END) AS bit_count FROM my_table
UNION ALL SELECT 3 AS "col", SUM(CASE WHEN bit_col & 16 > 0 THEN 1 ELSE 0 END) AS bit_count FROM my_table
UNION ALL SELECT 4 AS "col", SUM(CASE WHEN bit_col &  8 > 0 THEN 1 ELSE 0 END) AS bit_count FROM my_table
UNION ALL SELECT 5 AS "col", SUM(CASE WHEN bit_col &  4 > 0 THEN 1 ELSE 0 END) AS bit_count FROM my_table
UNION ALL SELECT 6 AS "col", SUM(CASE WHEN bit_col &  2 > 0 THEN 1 ELSE 0 END) AS bit_count FROM my_table
UNION ALL SELECT 7 AS "col", SUM(CASE WHEN bit_col &  1 > 0 THEN 1 ELSE 0 END) AS bit_count FROM my_table
ORDER BY bit_count DESC
;
 position | bit_count
----------+-----------
        6 |         6
        7 |         6
        4 |         4
        5 |         4
        2 |         0
        3 |         0
        1 |         0

http://docs.aws.amazon.com/redshift/latest/dg/r_OPERATOR_SYMBOLS.html

EDIT: If you would like something more dynamic you will need to look into using a UDF. You could start with my f_bitwise_to_string UDF as a template and add what you need from there. https://github.com/awslabs/amazon-redshift-udfs/blob/master/scalar-udfs/f_bitwise_to_string.sql