I have a table where I store information about a customer and the timestamp, and time range of an event.

The indices I use look as follows:

event_index (customer_id, time)

state_index (customer_id, end, start desc)

The vast majority of queries query the last few days about state and events.

This is a sample query text (events have identical an identical problem as I'll describe for states):

SELECT "states".*
FROM "states"
WHERE ("states"."customer_id" = $1 AND "states"."start" < $2)
       AND ("states"."end" IS NULL OR "states"."end" > $3)
       AND ("states"."obsolete" = $4)
ORDER BY "states"."start" DESC

I see that sometimes the query planner only uses only the customer_id to filter, and then filters using a heap scan all rows for the customer:

Sort  (cost=103089.00..103096.17 rows=2869 width=78)
  Sort Key: start DESC
  ->  Bitmap Heap Scan on states  (cost=1222.56..102924.23 rows=2869 width=78)
        Recheck Cond: (customer_id = '----'::bpchar)
        Filter: ((NOT obsolete) AND ((start)::double precision < '1557711009'::double precision) AND ((end IS NULL) OR ((end)::double precision > '1557666000'::double precision)))
        ->  Bitmap Index Scan on states_index  (cost=0.00..1221.85 rows=26820 width=0)
              Index Cond: (customer_id = '----'::bpchar)

This is in contrast to what I see in a session manually:

Sort Key: start DESC
Sort Method: quicksort  Memory: 25kB
->  Bitmap Heap Scan on states  (cost=111.12..9338.04 rows=1 width=78) (actual time=141.674..141.674 rows=0 loops=1)
      Recheck Cond: (((customer_id = '-----'::bpchar) AND (end IS NULL) AND (start < '1557349200'::numeric)) OR ((customer_id = '----'::bpchar) AND (end > '1557249200'::numeric) AND (start < '1557349200'::numeric)))
      Filter: ((NOT obsolete) AND ((title)::text = '---'::text))
      Rows Removed by Filter: 112
      Heap Blocks: exact=101
      ->  BitmapOr  (cost=111.12..111.12 rows=2333 width=0) (actual time=4.198..4.198 rows=0 loops=1)
            ->  Bitmap Index Scan on states_index  (cost=0.00..4.57 rows=1 width=0) (actual time=0.086..0.086 rows=0 loops=1)
                  Index Cond: ((customer_id = '----'::bpchar) AND (end IS NULL) AND (start < '1557349200'::numeric))
            ->  Bitmap Index Scan on state_index  (cost=0.00..106.55 rows=2332 width=0) (actual time=4.109..4.109 rows=112 loops=1)
                  Index Cond: ((customer_id = '---'::bpchar) AND (end > '1557262800'::numeric) AND (start < '1557349200'::numeric))

In other words - the query planner sometimes chooses to use only the first column of the index which slows the query significantly.

I can see why it makes sense to just bring the entire customer data when its small enough and filter in memory, but the problem is this data is very sparse and is probably not entirely cached (data from a year ago is probably not cached for the customer, database is a few hundreds of GBs). If the index would use the timestamps to the fullest extent (as in the second example) - the result should be much faster since recent data is cached.

I used a partial index on the last week to see if the query time drops but postgres only uses it sometimes. This solves the problem when the partial index is used since old rows do not exist in that index - but sadly postgres still selects the bigger index even when it doesn't have to. I ran vacuum analyze but to no visible effect.

I tried to see the cache hits using this:

  Database Name   | Temporary files | Size of temporary files |  Block Hits   | Block Reads 
------------------+-----------------+-------------------------+---------------+-------------
 customers        |            1922 |             18784440622 |   69553504584 |  2401546773

And then I calculated (block_hits/(block_hits + block_reads)):

>>> 69553504584.0 / (69553504584.0 + 2401546773.0)
0.9666243477322406

So this shows me ~96.6% cache (I want it much closer to 100 because I know the nature of the queries)

I also tried increasing statistics (SET STATISTICS) on the customer_id, start and end since it seemed to be a suggestion to people facing query planner issues. It didn't help as well (and I ran analyze after...).

After reading further about this issue I saw that there is a way to make the query planner prefer index scans using lower random_page_cost than the default (4). I also saw a post backing that here:

https://amplitude.engineering/how-a-single-postgresql-config-change-improved-slow-query-performance-by-50x-85593b8991b0

Does this make sense for my use case? Will it make the query planner use the index to the fullest more often (preferably always)?

If not - is there something else I can do to lower the query time? I know partitioning can be very effective but seems to be an overkill and is not fully supported on my current postgres version (9.5.9) as far as I can tell from what I've read.

Update: After lowering random_page_cost I don't see a conclusive difference. There are still times where the query planner chooses to only use part of the. index for a much slower result.

Any suggestions are very welcome.

Thanks :)

0 Answers