What Postgres 13 index types support distance searches?

630 views Asked by At

Original Question

We've had great results using a K-NN search with a GiST index with gist_trgm_ops. Pure magic. I've got other situations, with other datatypes like timestamp where distance functions would be quite useful. If I didn't dream it, this is, or was, available through pg_catalog. Looking around, I can't find a way to search on indexes by such properties. I think what I'm after, in this case, is AMPROP_DISTANCE_ORDERABLE under-the-hood.

Just checked, and pg_am did have a lot more attributes than it does now, prior to 9.6.

Is there another way to figure out what options various indexes have with a catalog search?

Catalogs

jjanes' answer inspired me to look at the system information functions some more, and to spend a day in the pg_catalog tables. The catalogs for indexes and operators are complicated. The system information functions are a big help. This piece proved super useful for getting a handle on things:

https://postgrespro.com/blog/pgsql/4161264

I think the conclusion is "no, you can't readily figure out what data types and indexes support proximity searches." The relevant attribute is a property of a column in a specific index. However, it looks like nearest-neighbor searching requires a GiST index, and that there are readily-available index operator classes to add K-NN searching to a huge range of common types. Happy for corrections on these conclusions, or the details below.

Built-in Distance Support

https://www.postgresql.org/docs/current/gist-builtin-opclasses.html

From various bits of the docs, it sounds like there are distance (proximity, nearest neighbor, K-NN) operators for GiST indexes on a handful of built-in geometric types.

box
circle
point
poly

B-tree Operator Classes

Not listed as such in the docs, but visible with this query:

select am.amname AS index_method
                , opc.opcname AS opclass_name
                , opc.opcintype::regtype AS indexed_type
                , opc.opcdefault AS is_default
             from pg_am am
                , pg_opclass opc
            where opc.opcmethod = am.oid 
             and am.amname = 'btree'
         order by 1,2;

B-tree GiST Distance Support

https://www.postgresql.org/docs/current/btree-gist.html

I guess a B-tree is a special case of a GiST, and there's a B-tree operator class to match. The docs say these native types are supported:

int2
int4
int8
float4
float8
timestamp with time zone
timestamp without time zone
time without time zone
date
interval
oid
money

BRIN Built-in Operator Classes

https://www.postgresql.org/docs/current/brin-builtin-opclasses.html

There are over 70 listed in the internals docs.

GIN Built-in Operator Classes

https://www.postgresql.org/docs/12/gin-builtin-opclasses.html

array_ops
jsonb_ops
jsonb_path_ops
tsvector_ops

Alternative Text Opts

https://www.postgresql.org/docs/current/indexes-opclass.html There are special operator classes for text comparisons made character-by-character, rather than through a collation. Or so the docs say:

text_pattern_ops
varchar_pattern_ops
bpchar_pattern_ops

pg_trgm

Beyond this, the included pg_trgm module includes operators for GIN and GiST, with the GiST version optimizing <->. I think this shows up as:

text

Note: Postgres 14 modifies pg_trgm to allow you to adjust the "signature length" for the index entry. Longer is possibly more accurate, shorter signatures are smaller on disk. If you've been using pg_trgm, it might be worth experimenting with the signature length in PG 14.

https://www.postgresql.org/docs/current/pgtrgm.html

SP-GiST Built-in Operator Classes

box_ops
kd_point_ops
network_ops
poly_ops
quad_point_ops
range_ops
text_ops

pg_operator search

Here's a search on pg_operator to look for matches starting from the <-> operator itself:

select oprnamespace::regnamespace::text  as schema_name,
       oprowner::regrole                 as owner,
       oprname                           as operator,
       
       oprleft::regtype                  as left,
       oprright::regtype                 as right,
       oprresult::regtype                as result,
       
       oprcom::regoperator              as commutator
              
 from pg_operator
where oprname = '<->'

order by 1

Output from one of our severs:

| schema_name | owner    | operator | left                        | right                       | result           | commutator                                                   |
+-------------+----------+----------+-----------------------------+-----------------------------+------------------+--------------------------------------------------------------+
| extensions  | postgres | <->      | text                        | text                        | real             | <->(text,text)                                               |
| extensions  | postgres | <->      | money                       | money                       | money            | <->(money,money)                                             |
| extensions  | postgres | <->      | date                        | date                        | integer          | <->(date,date)                                               |
| extensions  | postgres | <->      | real                        | real                        | real             | <->(real,real)                                               |
| extensions  | postgres | <->      | double precision            | double precision            | double precision | <->(double precision,double precision)                       |
| extensions  | postgres | <->      | smallint                    | smallint                    | smallint         | <->(smallint,smallint)                                       |
| extensions  | postgres | <->      | integer                     | integer                     | integer          | <->(integer,integer)                                         |
| extensions  | postgres | <->      | bigint                      | bigint                      | bigint           | <->(bigint,bigint)                                           |
| extensions  | postgres | <->      | interval                    | interval                    | interval         | <->(interval,interval)                                       |
| extensions  | postgres | <->      | oid                         | oid                         | oid              | <->(oid,oid)                                                 |
| extensions  | postgres | <->      | time without time zone      | time without time zone      | interval         | <->(time without time zone,time without time zone)           |
| extensions  | postgres | <->      | timestamp without time zone | timestamp without time zone | interval         | <->(timestamp without time zone,timestamp without time zone) |
| extensions  | postgres | <->      | timestamp with time zone    | timestamp with time zone    | interval         | <->(timestamp with time zone,timestamp with time zone)       |
| pg_catalog  | postgres | <->      | box                         | box                         | double precision | <->(box,box)                                                 |
| pg_catalog  | postgres | <->      | path                        | path                        | double precision | <->(path,path)                                               |
| pg_catalog  | postgres | <->      | line                        | line                        | double precision | <->(line,line)                                               |
| pg_catalog  | postgres | <->      | lseg                        | lseg                        | double precision | <->(lseg,lseg)                                               |
| pg_catalog  | postgres | <->      | polygon                     | polygon                     | double precision | <->(polygon,polygon)                                         |
| pg_catalog  | postgres | <->      | circle                      | circle                      | double precision | <->(circle,circle)                                           |
| pg_catalog  | postgres | <->      | point                       | circle                      | double precision | <->(circle,point)                                            |
| pg_catalog  | postgres | <->      | circle                      | point                       | double precision | <->(point,circle)                                            |
| pg_catalog  | postgres | <->      | point                       | polygon                     | double precision | <->(polygon,point)                                           |
| pg_catalog  | postgres | <->      | polygon                     | point                       | double precision | <->(point,polygon)                                           |
| pg_catalog  | postgres | <->      | circle                      | polygon                     | double precision | <->(polygon,circle)                                          |
| pg_catalog  | postgres | <->      | polygon                     | circle                      | double precision | <->(circle,polygon)                                          |
| pg_catalog  | postgres | <->      | point                       | point                       | double precision | <->(point,point)                                             |
| pg_catalog  | postgres | <->      | box                         | line                        | double precision | <->(line,box)                                                |
| pg_catalog  | postgres | <->      | tsquery                     | tsquery                     | tsquery          | 0                                                            |
| pg_catalog  | postgres | <->      | line                        | box                         | double precision | <->(box,line)                                                |
| pg_catalog  | postgres | <->      | point                       | line                        | double precision | <->(line,point)                                              |
| pg_catalog  | postgres | <->      | line                        | point                       | double precision | <->(point,line)                                              |
| pg_catalog  | postgres | <->      | point                       | lseg                        | double precision | <->(lseg,point)                                              |
| pg_catalog  | postgres | <->      | lseg                        | point                       | double precision | <->(point,lseg)                                              |
| pg_catalog  | postgres | <->      | point                       | box                         | double precision | <->(box,point)                                               |
| pg_catalog  | postgres | <->      | box                         | point                       | double precision | <->(point,box)                                               |
| pg_catalog  | postgres | <->      | lseg                        | line                        | double precision | <->(line,lseg)                                               |
| pg_catalog  | postgres | <->      | line                        | lseg                        | double precision | <->(lseg,line)                                               |
| pg_catalog  | postgres | <->      | lseg                        | box                         | double precision | <->(box,lseg)                                                |
| pg_catalog  | postgres | <->      | box                         | lseg                        | double precision | <->(lseg,box)                                                |
| pg_catalog  | postgres | <->      | point                       | path                        | double precision | <->(path,point)                                              |
| pg_catalog  | postgres | <->      | path                        | point                       | double precision | <->(point,path)                                              |
+-------------+----------+----------+-----------------------------+-----------------------------+------------------+--------------------------------------------------------------+

Did I miss any index opts worth knowing about?

Checking Out Live Indexes

Here's a longer-than-it-should-be-because-I-still-find-the-catalogs-confusing query to pull out the columns from each user index, and figure out their more interesting properties. For a nice, short catalog search of much utility, see https://dba.stackexchange.com/questions/186944/how-to-list-all-the-indexes-along-with-their-type-btree-brin-hash-etc

with 
basic_details as (
select relnamespace::regnamespace::text     as schema_name,
       indrelid::regclass::text             as table_name,
       indexrelid::regclass::text           as index_name,
       unnest(indkey)                       as column_ordinal_position , -- WITH ORDINALITY would be nice here, didn't get it working.
       generate_subscripts(indkey, 1) + 1   as column_position_in_index  -- 
                          
  from pg_index 
  join pg_class on pg_class.oid = pg_index.indrelid
),

enriched_details as (

  select basic_details.schema_name,
         basic_details.table_name,
         basic_details.index_name,
         basic_details.column_ordinal_position,
         basic_details.column_position_in_index,
                  
         columns.column_name,
         columns.udt_name     as column_type_name      
  
    from basic_details 
    
    join information_schema.columns as columns 
      on columns.table_schema     = basic_details.schema_name
     and columns.table_name       = basic_details.table_name
     and columns.ordinal_position = basic_details.column_ordinal_position
                     
    where schema_name not like 'pg_%'
  )
  
  select *,
        -- https://postgrespro.com/blog/pgsql/4161264
         coalesce(pg_index_column_has_property(index_name,column_position_in_index,'distance_orderable'), false) as supports_knn_searches,
         coalesce(pg_index_column_has_property(index_name,column_position_in_index,'search_array'), false)       as supports_in_searches,
         coalesce(pg_index_column_has_property(index_name,column_position_in_index,'returnable'), false)         as supports_index_only_scans,
        
        
         (select indexdef 
             from pg_indexes 
            where pg_indexes.schemaname  = enriched_details.schema_name
              and pg_indexes.indexname   = enriched_details.index_name) as index_definition
  
     from enriched_details 
  
 order by supports_in_searches desc,
          schema_name,
          table_name,
          index_name
2

There are 2 answers

1
jjanes On BEST ANSWER

timestamp type supports KNN with GiST indexes using the <-> operator created by the btree_gist extension.

You can check if a specific column of a specific index supports it, like this:

select pg_index_column_has_property('pgbench_history_mtime_idx'::regclass,1,'distance_orderable');
0
Morris de Oryx On

As best as I can tell, here's the state of play as of PG 14:

  • GiST indexes may support nearest-neighbor (K-NN) proximity <--> search, and always have.

  • SP-GiST added such support as of PG 12.

  • RUM indexes (not in core) also support K-NN.

In all cases, support is done in the operator class:

https://www.postgresql.org/docs/current/indexes-opclass.html

That's what determines if distance_orderable works for a specific data type on a specific kind of index. Built-in, some of the geometric and text vector types work out-of-the box. Other than that small set, many more types are supported via specific operator classes, such as:

https://www.postgresql.org/docs/current/btree-gist.html https://www.postgresql.org/docs/current/pgtrgm.html

In the case of SP-GiST, there are a lot fewer types supported than with GiST, once you've installed btree_gist:

https://www.postgresql.org/docs/14/spgist-builtin-opclasses.html

It looks like text_opts and range_opts do not support proximity searches. However, for tsrange, etc., there are likely enough options with other tools.