Strange sorting behavior with bigint column via GiST index in PostgreSQL

193 views Asked by At

I'm working on implementing a fast text search in PostgreSQL 12.6 (Ubuntu 20.04.2 VBox) with custom sorting, and I'm using pg_trgm along with GiST (btree_gist) index for sorted output. The idea is to return top 5 matching artists that have the highest number of plays. The index is created like this:

create index artist_name_gist_idx on artist 
  using gist ("name" gist_trgm_ops, total_play_count) where active = true;

"name" here is of type varchar(255) and total_play_count is bigint, no nulls allowed.

When I query the table like this:

select id, name, total_play_count 
from artist 
where name ilike '%the do%' and active = true 
order by total_play_count <-> 40312 
limit 5;

I get the correct result:

   id   |          name           | total_play_count 
--------+-------------------------+------------------
   1757 | The Doors               |             1863
 733226 | Damsel in the Dollhouse |             1095
   9758 | The Doobie Brothers     |             1036
 822805 | The Doubleclicks        |              580
   7236 | Slaughter and the Dogs  |              258

I would get the same result if I replace total_play_count <-> 40312 with simple total_play_count desc, but then I get the additional sort operation that I want to avoid. Number 40312 here is the current maximum value of this column, and table itself contains 1612297 rows in total.

However, since total_play_count is of type bigint, I wanted to make this query more general (and faster) and use the maximum value for bigint, so I don't have to query for the max value every time. But when I update the ORDER BY clause with total_play_count <-> 9223372036854775807, I get the following result:

   id    |          name           | total_play_count 
---------+-------------------------+------------------
    1757 | The Doors               |             1863
  822805 | The Doubleclicks        |              580
    9758 | The Doobie Brothers     |             1036
  733226 | Damsel in the Dollhouse |             1095
 1380763 | Bruce Bawss The Don     |               10

The ordering here is broken, and it's even worse when I try the same approach on another table that has a lot more rows. There are no negative or overly large values, so overflow shouldn't be possible. Results of explain are almost identical:

 Limit  (cost=0.41..6.13 rows=5 width=34)
   ->  Index Scan using artist_name_gist_idx on artist  (cost=0.41..184.44 rows=161 width=34)
         Index Cond: ((name)::text ~~* '%the do%'::text)
         Order By: (total_play_count <-> '9223372036854775807'::bigint)

What could be the issue here? Is this a bug with btree_gist, or am I missing something? I could settle for querying for the max value, but it worries me that there is a threshold that might be reached eventually and break the search, which would be a shame since I'm quite happy with the performance.


Update:

I've tried using regular integer type instead of bigint, and then query with it's upper bound total_play_count <-> 2147483647. It seems that there is no such problem with it. Perhaps using bigint in the first place was somewhat optimistic, but I'll keep the question open in case anyone has an answer or a workaround.

0

There are 0 answers