I have a table
T (A int, B int, C long, D varchar)
partitioned by each A
and sub-partitioned by each B
(i.e. list partitions with a single value each). A
has cardinality of <10 and B
has cardinality of <100. T
has about 6 billion rows.
When I run the query
select distinct B from T where A = 1;
it prunes the top-level partitions (those where A != 1
) but performs a table scan on all sub-partitions to find distinct values of B
. I thought it would know, based on the partition design, that it would only have to check the partition constraint to determine the possible values of B
given A
, but alas, that is not the case.
There are no indexes on A
or B
, but there is a primary key on (C,D)
at each partition, which seems immaterial, but figured I should mention it. I also have a BRIN index on C
. Any idea why the Postgres query planner is not consulting the sub-partition constraints to avoid the table scan?
The reason is that nobody implemented such an optimization in the query planner. I cannot say that that surprises me, since it is a rather unusual query. Every such optimization built into the optimizer would mean that each query on a partitioned table that has a
DISTINCT
would need some extra query planning time, while only few queries would profit. Apart from the expense of writing and maintaining the code, that would be a net loss for most users.Maybe you could use a metadata query: