We are in the process of migrating a MySQL 5.7 database to PostgreSQL 9.6.
A real issue is the lack of bit_count
function in PostgreSQL. This function is also not available in the upcoming version 10.
Current MySQL code snippet (simplified):
-- mysql specific, tested with 5.7.19
select code,phash,bit_count(phash ^ -9187530158960050433) as hd
from documents
where phash is not null and bit_count(phash ^ -9187530158960050433) < 7
order by hd;
We tried a naive solution (converting the BIGINT to a String and counting the "1"'s), but it performs terribly compared to MySQL.
Java uses a trick from Hacker's Delight, but AFAIK this is not possible with PostgreSQL, because the >>>
operator is (also) not available.
Question: Is a there solution/workaround available comparable with MySQL performance wise?
UPDATE 1
Best solution i could find is based on this SO answer:
First create bit_count
function:
CREATE OR REPLACE FUNCTION bit_count(value bigint)
RETURNS numeric
AS $$ SELECT SUM((value >> bit) & 1) FROM generate_series(0, 63) bit $$
LANGUAGE SQL IMMUTABLE STRICT;
Now we can use almost the same SQL as with MySQL:
-- postgresql specific, tested with 9.6.5
select code,phash,bit_count(phash # -9187530158960050433) as hd
from documents
where phash is not null and bit_count(phash # -9187530158960050433) < 7
order by hd;
UPDATE 2
Based on @a_horse_with_no_name comment, i tried this function:
-- fastest implementation so far. 10 - 11 x faster than the naive solution (see UPDATE 1)
CREATE OR REPLACE FUNCTION bit_count(value bigint)
RETURNS integer
AS $$ SELECT length(replace(value::bit(64)::text,'0','')); $$
LANGUAGE SQL IMMUTABLE STRICT;
However, this is still 5 - 6 times slower than MySQL (tested wit exact the same data set of 200k phash values on the same hardware).
Function bit_count is available since PostgreSQL version 14, see Bit String Functions and Operators.
Example:
Result is 3.
Note that the function is defined for types bit and bit varying. So if you want to use it with integer values, you need to cast.
Example:
Result is B'1101'.
Combining both examples: