I have a table
T (A int, B int, C long, D varchar)
partitioned by each A and sub-partitioned by each B (i.e. list partitions with a single value each). A has cardinality of <10 and B has cardinality of <100. T has about 6 billion rows.
When I run the query
select distinct B from T where A = 1;
it prunes the top-level partitions (those where A != 1) but performs a table scan on all sub-partitions to find distinct values of B. I thought it would know, based on the partition design, that it would only have to check the partition constraint to determine the possible values of B given A, but alas, that is not the case.
There are no indexes on A or B, but there is a primary key on (C,D) at each partition, which seems immaterial, but figured I should mention it. I also have a BRIN index on C. Any idea why the Postgres query planner is not consulting the sub-partition constraints to avoid the table scan?
The reason is that nobody implemented such an optimization in the query planner. I cannot say that that surprises me, since it is a rather unusual query. Every such optimization built into the optimizer would mean that each query on a partitioned table that has a
DISTINCTwould need some extra query planning time, while only few queries would profit. Apart from the expense of writing and maintaining the code, that would be a net loss for most users.Maybe you could use a metadata query: