I have table which has a few billion rows. There are two columns in the table:
match_id uuid,
group_id integer
And there are indexes created on both of the above columns:
create index if not exists my_tbl_match_id_idx on my_tbl (match_id);
create index if not exists my_tbl_group_id_idx on my_tbl (group_id);
When I perform following query, both indexes are scanned in parallel.
match_id index look up is much faster and returns result back in 13 milliseconds while group_id index look up is slower and takes 1.3 seconds. Because both indexes are looked up in parallel, overall query time = 1.3 seconds i.e. look up time of group_id index.
Is there any way Postgres can just use group_id index after the match_id result set is computed?
SELECT *
FROM my_table
WHERE match_id = 'e089e0af-543b-45d5-abbf-22c6c5ed9a01'
AND (group_id IN (167,1704,1864,2065,2145,3812,3814,3855,7462,11393,11394,11396))
Or do I need to create a composite index for match_id and group_id?
There are many factors with bearing on the best course of action, most of which are not clear from the question. Consider instructions for postgresql-performance questions here.
Generally speaking, a multicolumn index on
(match_id, group_id)(with leadingmatch_id!) performs substantially better for your query than combining two separate indexes in bitmap index scans.In effect, this index implements your requirement exactly:
A multicolumn index is sorted by leading expressions first. Using this index, Postgres excludes the lion share of irrelevant matches on
group_idfor your query. Should be particularly impactful for your case sincematch_idseems to be much more selective.See:
This index also covers queries on just
match_idlike you mentioned in your comment. Sincematch_idis the more selective and bigger column (uuid= 16 bytes, int = 4 bytes), the multicolumn index isn't much bigger than the one on just(match_id), and should work almost as well for the purpose, making the overhead for maintaining a second index questionable. The other index on just(group_id)may still serve its purpose for queries on justgroup_id.See:
Other advice
Your query can benefit a lot from physically clustering rows on
match_id. UseCLUSTERor one of the less blocking community tools. See:Only reasonable if there isn't too much write activity on the table (or large parts of it).
Also, if you don't really need
SELECT *, but only a small subset of columns, there is potential for optimization.