Pagination by compound key

88 views Asked by At

I am trying to optimize my query which runs on Postgresql version 13 or 14.

Let's say we have table with corresponding index:

CREATE TABLE journal 
(
    account_id TEXT NOT NULL,
    event_at TEXT NOT NULL,
    id INTEGER PRIMARY KEY
);

CREATE INDEX journal_account_id_event_at_id_idx 
    ON journal (account_id, event_at, id);

INSERT INTO journal VALUES ('A', '2023-01-01', 1);
INSERT INTO journal VALUES ('A', '2023-01-10', 50);
INSERT INTO journal VALUES ('A', '2023-01-30', 15);
INSERT INTO journal VALUES ('A', '2023-03-02', 28);
INSERT INTO journal VALUES ('A', '2023-03-05', 16);
INSERT INTO journal VALUES ('B', '2023-01-01', 101);
INSERT INTO journal VALUES ('B', '2023-01-01', 102);
INSERT INTO journal VALUES ('B', '2023-01-01', 103);
INSERT INTO journal VALUES ('C', '2022-12-01', 2);
INSERT INTO journal VALUES ('C', '2023-01-02', 10);
INSERT INTO journal VALUES ('C', '2023-01-30', 6);
INSERT INTO journal VALUES ('C', '2023-01-30', 20);
INSERT INTO journal VALUES ('C', '2023-02-02', 29);
INSERT INTO journal VALUES ('C', '2023-03-03', 31);

I need to select from said table via pagination. If we were to know account we need to select journal entries for, then we could do something like this:

Query #1:

SELECT *
FROM journal
WHERE
    account_id = 'C' -- account condition
    AND (event_at, id) > ('2022-12-01', 2) -- pagination condition
ORDER BY event_at, id
LIMIT 2;  -- page size for pagination

                                                                          QUERY PLAN                                                                           
---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.14..8.15 rows=1 width=68) (actual time=0.029..0.036 rows=2 loops=1)
   Output: account_id, event_at, id
   ->  Index Only Scan using journal_account_id_event_at_id_idx on public.journal  (cost=0.14..8.15 rows=1 width=68) (actual time=0.026..0.029 rows=2 loops=1)
         Output: account_id, event_at, id
         Index Cond: ((journal.account_id = 'C'::text) AND (ROW(journal.event_at, journal.id) > ROW('2022-12-01'::text, 2)))
         Heap Fetches: 2
 Planning Time: 0.204 ms
 Execution Time: 0.085 ms
(8 rows)

Which works perfectly fine, postgres uses index and, as you can see from the query plan, in accordance with limit, only 2 journal entries have been read (see 'Index Only Scan' node has rows=2). However, if we need to select journal entries for multiple accounts and we don't know exact accounts in advance, postgres seem to select all entries and sort them in order to properly limit the output:

Query #2:

WITH accounts AS 
(
    -- let's pretend here is some select from 'accounts' table
    SELECT *
    FROM unnest(ARRAY['A', 'C']::TEXT[]) AS account(id)
)
SELECT j.*
FROM journal j JOIN accounts a ON a.id = j.account_id -- account condition
WHERE (event_at, j.id) > ('2022-12-01', 2) -- pagination condition
ORDER BY event_at, j.id
LIMIT 2;  -- page size for pagination
                                                                                 QUERY PLAN                                                                                  
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=12.36..12.37 rows=1 width=68) (actual time=0.120..0.134 rows=2 loops=1)
   Output: j.account_id, j.event_at, j.id
   ->  Sort  (cost=12.36..12.37 rows=1 width=68) (actual time=0.117..0.125 rows=2 loops=1)
         Output: j.account_id, j.event_at, j.id
         Sort Key: j.event_at, j.id
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Nested Loop  (cost=0.14..12.35 rows=1 width=68) (actual time=0.049..0.091 rows=10 loops=1)
               Output: j.account_id, j.event_at, j.id
               ->  Function Scan on pg_catalog.unnest account  (cost=0.00..0.02 rows=2 width=32) (actual time=0.008..0.011 rows=2 loops=1)
                     Output: account.id
                     Function Call: unnest('{A,C}'::text[])
               ->  Index Only Scan using journal_account_id_event_at_id_idx on public.journal j  (cost=0.14..6.15 rows=1 width=68) (actual time=0.020..0.025 rows=5 loops=2)
                     Output: j.account_id, j.event_at, j.id
                     Index Cond: ((j.account_id = account.id) AND (ROW(j.event_at, j.id) > ROW('2022-12-01'::text, 2)))
                     Heap Fetches: 10
 Planning Time: 0.307 ms
 Execution Time: 0.188 ms
(17 rows)

We now see from the query plan, that 'Nested loop' node has rows=10, which means that postgres read all entries that satisfy pagination condition only for them to be omitted later in 'Limit' node. My thought is, since we have compound index (account_id, event_at, id) for the journal table, postgres could have read limit journal entries for each account from the 'accounts' CTE and then merge them without even sorting, since entries alread sorted by (event_at, id), essentially doing something like this:

Query #3:

SELECT *
FROM (
  SELECT *
  FROM journal
  WHERE (account_id = 'A')
    AND (event_at, id) > ('2022-12-01', 2) -- pagination condition
  ORDER BY event_at, id
  LIMIT 2
) j1
UNION ALL
SELECT *
FROM (
  SELECT *
  FROM journal
  WHERE (account_id = 'C')
    AND (event_at, id) > ('2022-12-01', 2) -- pagination condition
  ORDER BY event_at, id
  LIMIT 2
) j2
ORDER BY event_at, id
LIMIT 2;
                                                                                     QUERY PLAN                                                                                      
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.28..16.36 rows=2 width=68) (actual time=0.053..0.070 rows=2 loops=1)
   Output: journal.account_id, journal.event_at, journal.id
   ->  Merge Append  (cost=0.28..16.36 rows=2 width=68) (actual time=0.050..0.063 rows=2 loops=1)
         Sort Key: journal.event_at, journal.id
         ->  Limit  (cost=0.14..8.15 rows=1 width=68) (actual time=0.033..0.038 rows=2 loops=1)
               Output: journal.account_id, journal.event_at, journal.id
               ->  Index Only Scan using journal_account_id_event_at_id_idx on public.journal  (cost=0.14..8.15 rows=1 width=68) (actual time=0.031..0.033 rows=2 loops=1)
                     Output: journal.account_id, journal.event_at, journal.id
                     Index Cond: ((journal.account_id = 'A'::text) AND (ROW(journal.event_at, journal.id) > ROW('2022-12-01'::text, 2)))
                     Heap Fetches: 2
         ->  Limit  (cost=0.14..8.15 rows=1 width=68) (actual time=0.014..0.016 rows=1 loops=1)
               Output: journal_1.account_id, journal_1.event_at, journal_1.id
               ->  Index Only Scan using journal_account_id_event_at_id_idx on public.journal journal_1  (cost=0.14..8.15 rows=1 width=68) (actual time=0.012..0.013 rows=1 loops=1)
                     Output: journal_1.account_id, journal_1.event_at, journal_1.id
                     Index Cond: ((journal_1.account_id = 'C'::text) AND (ROW(journal_1.event_at, journal_1.id) > ROW('2022-12-01'::text, 2)))
                     Heap Fetches: 1
 Planning Time: 0.362 ms
 Execution Time: 0.132 ms
(18 rows)

Please help me understand how to write query like query #2 (for multiple accounts) in such a way that it would have a query plan like query #3 (which takes advantage of already sorted data and read less data than query #2).

EDIT #1

Thanks to Laurenz Albe for mentioning "skip scan". I was able to find a possible solution - using RECURSIVE CTE to emulate skip scan. I tried to use this aproach for my data and came up with this:

Query #4:

EXPLAIN (ANALYZE, VERBOSE)
WITH RECURSIVE accounts AS (
  -- lets pretent here is some select from 'accounts' table
  SELECT *
  FROM unnest(ARRAY['A', 'C']::TEXT[]) AS account(id)
  ORDER BY id
), journal_entries AS (
    (
      SELECT *
      FROM journal
      WHERE
        account_id = (SELECT MIN(id) FROM accounts)
        AND (event_at, id) > ('2022-12-01', 2) -- pagination condition
      ORDER BY event_at, id
      LIMIT 2  -- page size for pagination
    )
    UNION ALL
    SELECT chunk.*
    FROM (
      SELECT account_id FROM journal_entries LIMIT 1
    ) prev CROSS JOIN LATERAL (
      SELECT *
      FROM journal j
      WHERE
        j.account_id = (
          SELECT MIN(id)
          FROM accounts
          WHERE id > prev.account_id
          LIMIT 1
        )
        AND (event_at, id) > ('2022-12-01', 2) -- pagination condition
      ORDER BY event_at, id
      LIMIT 2 -- page size for pagination
    ) chunk
)
SELECT *
FROM journal_entries
ORDER BY event_at, id
LIMIT 2; -- page size for pagination;
                                                                                     QUERY PLAN                                                                                      
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=91.42..91.43 rows=2 width=68) (actual time=0.198..0.230 rows=2 loops=1)
   Output: journal_entries.account_id, journal_entries.event_at, journal_entries.id
   CTE accounts
     ->  Sort  (cost=0.03..0.04 rows=2 width=32) (actual time=0.028..0.032 rows=2 loops=1)
           Output: account.id
           Sort Key: account.id
           Sort Method: quicksort  Memory: 25kB
           ->  Function Scan on pg_catalog.unnest account  (cost=0.00..0.02 rows=2 width=32) (actual time=0.009..0.011 rows=2 loops=1)
                 Output: account.id
                 Function Call: unnest('{A,C}'::text[])
   CTE journal_entries
     ->  Recursive Union  (cost=0.19..91.05 rows=11 width=68) (actual time=0.082..0.179 rows=4 loops=1)
           ->  Limit  (cost=0.19..8.21 rows=1 width=68) (actual time=0.080..0.090 rows=2 loops=1)
                 Output: journal.account_id, journal.event_at, journal.id
                 InitPlan 2 (returns $2)
                   ->  Aggregate  (cost=0.04..0.06 rows=1 width=32) (actual time=0.044..0.047 rows=1 loops=1)
                         Output: min(accounts.id)
                         ->  CTE Scan on accounts  (cost=0.00..0.04 rows=2 width=32) (actual time=0.030..0.036 rows=2 loops=1)
                               Output: accounts.id
                 ->  Index Only Scan using journal_account_id_event_at_id_idx on public.journal  (cost=0.14..8.15 rows=1 width=68) (actual time=0.078..0.080 rows=2 loops=1)
                       Output: journal.account_id, journal.event_at, journal.id
                       Index Cond: ((journal.account_id = $2) AND (ROW(journal.event_at, journal.id) > ROW('2022-12-01'::text, 2)))
                       Heap Fetches: 2
           ->  Nested Loop  (cost=0.19..8.26 rows=1 width=68) (actual time=0.027..0.037 rows=1 loops=2)
                 Output: j.account_id, j.event_at, j.id
                 ->  Limit  (cost=0.00..0.02 rows=1 width=32) (actual time=0.004..0.006 rows=1 loops=2)
                       Output: journal_entries_1.account_id
                       ->  WorkTable Scan on journal_entries journal_entries_1  (cost=0.00..0.20 rows=10 width=32) (actual time=0.002..0.002 rows=1 loops=2)
                             Output: journal_entries_1.account_id
                 ->  Limit  (cost=0.19..8.21 rows=1 width=68) (actual time=0.019..0.025 rows=1 loops=2)
                       Output: j.account_id, j.event_at, j.id
                       InitPlan 3 (returns $4)
                         ->  Limit  (cost=0.05..0.06 rows=1 width=32) (actual time=0.008..0.010 rows=1 loops=2)
                               Output: (min(accounts_1.id))
                               ->  Aggregate  (cost=0.05..0.06 rows=1 width=32) (actual time=0.005..0.007 rows=1 loops=2)
                                     Output: min(accounts_1.id)
                                     ->  CTE Scan on accounts accounts_1  (cost=0.00..0.04 rows=1 width=32) (actual time=0.002..0.003 rows=0 loops=2)
                                           Output: accounts_1.id
                                           Filter: (accounts_1.id > $3)
                                           Rows Removed by Filter: 2
                       ->  Index Only Scan using journal_account_id_event_at_id_idx on public.journal j  (cost=0.14..8.15 rows=1 width=68) (actual time=0.007..0.008 rows=1 loops=2)
                             Output: j.account_id, j.event_at, j.id
                             Index Cond: ((j.account_id = $4) AND (ROW(j.event_at, j.id) > ROW('2022-12-01'::text, 2)))
                             Heap Fetches: 2
   ->  Sort  (cost=0.33..0.36 rows=11 width=68) (actual time=0.196..0.199 rows=2 loops=1)
         Output: journal_entries.account_id, journal_entries.event_at, journal_entries.id
         Sort Key: journal_entries.event_at, journal_entries.id
         Sort Method: quicksort  Memory: 25kB
         ->  CTE Scan on journal_entries  (cost=0.00..0.22 rows=11 width=68) (actual time=0.091..0.175 rows=4 loops=1)
               Output: journal_entries.account_id, journal_entries.event_at, journal_entries.id
 Planning Time: 0.506 ms
 Execution Time: 0.422 ms
(52 rows)

As we can see, postgres still sorts journal entries instead of merging them, but there is less journal entires than in Query #2 - we only read limit entries for each account (see CTE Scan on journal_entries rows=4). I guess it's the best i can get out of this.

1

There are 1 answers

1
Laurenz Albe On BEST ANSWER

An index scan is only efficient if account_id is compared with =. Since PostgreSQL does not have an index "skip scan", you have to rewrite the query like you did.

Not everything that is possible is implemented in the PostgreSQL optimizer.