Given these tables:
TABLE Stores (
store_id INT,
store_name VARCHAR,
etc
);
TABLE Employees (
employee_id INT,
store_id INT,
employee_name VARCHAR,
currently_employed BOOLEAN,
etc
);
I want to list the 15 longest-employed employees for each store (let's say the 15 with lowest employee_id
), or ALL employees for a store if there are 15 who are currently_employed='t'
. I want to do it with a join clause.
I've found a lot of examples of people doing this only for 1 row, usually a min or max (single longest-employed employee), but I want to basically do combine an ORDER BY
and a LIMIT
inside of the join. Some of those examples can be found here:
I've also found decent examples for doing this store-by-store (I don't, I have about 5000 stores):
I've also seen that you can use TOP
instead of ORDER BY
and LIMIT
, but not for PostgreSQL.
I reckon that a join clause between the two tables isn't the only (or even necessarily best way) to do this, if it's possible to just work by distinct store_id
inside of the employees table, so I'd be open to other approaches. Can always join afterwards.
As I'm very new to SQL, I'd like any theory background or additional explanation that can help me understand the principles at work.
row_number()
The general solution to get the top n rows per group is with the window function
row_number()
:PARTITION BY
should usestore_id
, which is guaranteed to be unique (as opposed tostore_name
).First identify rows in
employees
, then join tostores
, that's cheaper.To get 15 rows use
row_number()
notrank()
(would be the wrong tool for the purpose). (Whileemployee_id
is unique, the difference doesn't show.)LATERAL
An alternative since Postgres 9.3 that typically performs (much) better in combination with a matching index, especially when retrieving a small selection from a big table. See:
The perfect index would be a partial multicolumn index like this:
Related example:
Both versions exclude stores without current employees. There are ways around this if needed with
LEFT JOIN LATERAL
...