I am using sqlalchemy core with a postgres database, and am implementing the JSON:API spec.
For pagination I am just using basic LIMIT
and OFFSET
. So for example with the following two tables:
session
id
name
appearance
id
date
session_id
For session I can simply paginate it like:
SELECT id, name FROM session LIMIT 20 OFFSET 40
Our database can only grow so large due to limits on how long the data can be kept, so I am not worried about the slowdowns the LIMIT
OFFSET
method can create. In the above example it works fine, the number of records is always equal to the limit, or less if at the end. But my problem comes when I have:
SELECT s.id, s.name, a.date FROM session s JOIN appearance a on s.id = a.session_id LIMIT 3 OFFSET 0
Because there may be say two appearance
rows referencing the session then I may end up with something like:
s.id s.name a.date
1 FirstSesh 24/04/14
1 FirstSesh 01/01/20
2 Hello 09/09/10
Now I have only returned one row and potentially another or just a part of another.
The first solution I thought of was doing:
SELECT s.id, s.name, a.date FROM (SELECT id, name FROM session LIMIT 3 OFFSET 0) s JOIN appearance a on s.id = a.session_id
But that now limits by ability to use WHERE
or ORDER
on s
since it will be limited to three. And I can't just put all those conditions inside of the sub-query, because of the way I've built up the relationships part of JSON:API, and because I want to restrict whether I return a session
also based on if e.g. the appearances.date are before say 2012. Which would cause the same issue.
For reference each type in my JSON:API setup has its own query, and then when used in relationships those queries are used as subqueries, which allows simple recursive relationships and simple implementation of new relationships.
If I could instead do something like use LIMIT
and OFFSET
based on groups of say the session id's then I think that might work? But I'm not sure how I'd do that?
You can use window functions. For example, this gives you the first 3 sessions (ordered by
id
), along with all the corresponding appearances (no matter how many of them match).You can then "paginate" through the resultset by changing the bounds of the
BETWEEN
condition.Edit
Alternatively: