How to use LIMIT and OFFSET when joining one to many tables with postgresql?

2.5k views Asked by At

I am using sqlalchemy core with a postgres database, and am implementing the JSON:API spec.

For pagination I am just using basic LIMIT and OFFSET. So for example with the following two tables:

session
id
name

appearance
id
date
session_id

For session I can simply paginate it like:

SELECT id, name FROM session LIMIT 20 OFFSET 40

Our database can only grow so large due to limits on how long the data can be kept, so I am not worried about the slowdowns the LIMIT OFFSET method can create. In the above example it works fine, the number of records is always equal to the limit, or less if at the end. But my problem comes when I have:

SELECT s.id, s.name, a.date FROM session s JOIN appearance a on s.id = a.session_id LIMIT 3 OFFSET 0

Because there may be say two appearance rows referencing the session then I may end up with something like:

s.id     s.name                 a.date
1        FirstSesh              24/04/14
1        FirstSesh              01/01/20
2        Hello                  09/09/10

Now I have only returned one row and potentially another or just a part of another.

The first solution I thought of was doing:

SELECT s.id, s.name, a.date FROM (SELECT id, name FROM session LIMIT 3 OFFSET 0) s JOIN appearance a on s.id = a.session_id

But that now limits by ability to use WHERE or ORDER on s since it will be limited to three. And I can't just put all those conditions inside of the sub-query, because of the way I've built up the relationships part of JSON:API, and because I want to restrict whether I return a session also based on if e.g. the appearances.date are before say 2012. Which would cause the same issue.

For reference each type in my JSON:API setup has its own query, and then when used in relationships those queries are used as subqueries, which allows simple recursive relationships and simple implementation of new relationships.

If I could instead do something like use LIMIT and OFFSET based on groups of say the session id's then I think that might work? But I'm not sure how I'd do that?

1

There are 1 answers

1
GMB On

You can use window functions. For example, this gives you the first 3 sessions (ordered by id), along with all the corresponding appearances (no matter how many of them match).

SELECT s.id, s.name, a.date 
FROM (SELECT s.*, ROW_NUMBER() OVER(ORDER BY id) rn FROM sessions) s 
INNER JOIN appearance a ON s.id = a.session_id 
WHERE s.rn BETWEEN 0 AND 3
ORDER BY s.rn, a.date

You can then "paginate" through the resultset by changing the bounds of the BETWEEN condition.

Edit

Alternatively:

SELECT id, name, date
FROM (
    SELECT s.id, s.name, a.date,
        DENSE_RANK () OVER(ORDER BY id) rn
    FROM sessions s 
    INNER JOIN appearance a ON s.id = a.session_id 
    WHERE a.is_admin = 1
) c
WHERE s.rn BETWEEN 0 AND 3
ORDER BY rn, date