PostgreSQL multi-column index including arrays

359 views Asked by At

Docs recommend using GIN index for array columns. However, I want to query by a combination of that column and a boolean column and I cannot add the boolean one to the index as GIN doesn't support that type. Am I better off (a) making a separate index for the boolean column, (b) using a different index type (which?), or (c) not indexing the boolean column given that in my case the result set from a search over the array column index will be just a few rows, so if the query optimizer searches for the matching boolean value within that it will only have a small number of comparisons?

create table foo (
    id integer generated by default as identity primary key,
    ...
    bar bool not null, -- TODO: Separate index? Cannot include bool in GIN index
    ...
    baz smallint[] not null);
create index foo_baz_idx on lambdas using gin (baz);

Where most queries will be in the form of select * from foo where X = any(baz) and bar = Y and there would be at most a small number of rows within a search for X alone

1

There are 1 answers

0
Schwern On BEST ANSWER

It really depends on the nature of your data. If where X = any(baz) results in a small number of rows then there is no need to also index bar.

If where X = any(baz) results in a large number of rows, a separate index on bar may help; it would give the query planner more options. But since it's a boolean you can instead partition the table on bar. Then every query is effectively indexed on where bar = ?.

Query performance can be improved dramatically in certain situations, particularly when most of the heavily accessed rows of the table are in a single partition or a small number of partitions. The partitioning substitutes for leading columns of indexes, reducing index size and making it more likely that the heavily-used parts of the indexes fit in memory.

When queries or updates access a large percentage of a single partition, performance can be improved by taking advantage of sequential scan of that partition instead of using an index and random access reads scattered across the whole table.