Group by limit per group (PostgreSQL)

282 views Asked by At

I have following query:

WITH relationships AS (
    SELECT related_user_id, count(*) AS trade_count
    FROM trade_history
    WHERE user_id = 487834568 
    GROUP BY related_user_id
    ORDER BY trade_count DESC
)
SELECT offers.*,
       relationships.trade_count
FROM offers
         LEFT JOIN user_stock
                   ON user_stock.user_id = 487834568 and offers.product_id = user_stock.product_id
         LEFT JOIN relationships
                   ON offers.user_id = relationships.related_user_id
WHERE offers.state = 'OPEN'
  AND offers.user_id != 487834568        
  AND offers.group BETWEEN 1 AND 3 
ORDER BY offers.created_at,
         relationships.trade_count DESC,
         user_stock.amount NULLS FIRST;

The query shows me all offers and orders them by:

  • Show older offers first
  • Show offers with a higher trade count first (two user traded which each other)
  • User stock

What I need in addition is to limit the results by max 3 per product_id. I did some googling and figured out that this should be possible by a window function using row_number() and lateral joins. I don't want to use row_number() as the tables contain a lot of entries and I would run into performance issues most likely. I guess lateral joins are the right tool (read here http://charlesnagy.info/it/postgresql/group-by-limit-per-group-in-postgresql) but I could not get a decent result with my limited SQL knowledge.

How do I get the grouping in a performant way?

Edit: To make it more clear, I implemented the solution with a window function:

WITH relationships AS (
    SELECT related_user_id, count(*) AS trade_count
    FROM trade_history
    WHERE user_id = 487834568 
    GROUP BY related_user_id
    ORDER BY trade_count DESC
)
SELECT * FROM (
    SELECT 
           offers.*,
           relationships.trade_count,
           row_number() OVER (
               PARTITION BY resource_id
               ORDER BY 
                   offers.created_at,
                   relationships.trade_count DESC,
                   user_stock.amount NULLS FIRST
        ) AS row_number
    FROM offers
             LEFT JOIN user_stock
                       ON user_stock.user_id = 487834568 and offers.product_id = user_stock.product_id
             LEFT JOIN relationships
                       ON offers.user_id = relationships.related_user_id
    WHERE offers.state = 'OPEN'
      AND offers.user_id != 487834568        
      AND offers.group BETWEEN 1 AND 3 
                  ORDER BY row_number
              ) AS ordered_offers
WHERE ordered_offers.row_number <= 3;
1

There are 1 answers

7
mike.k On

If you can add some grouping there, add HAVING count(offers.product_id) <= 3 between WHERE and ORDER BY

Putting it all together, this feels right. I'm going to try a similar grouping based on a different local DB and see if it works as I expect, will update if it doesn't.

WITH relationships AS (
  SELECT related_user_id, count(*) AS trade_count
  FROM trade_history
  WHERE user_id = 487834568 
  GROUP BY related_user_id
  ORDER BY trade_count DESC
)
SELECT
  offers.*,
  relationships.trade_count
FROM offers
LEFT JOIN user_stock ON
  user_stock.user_id = 487834568 AND
  offers.product_id = user_stock.product_id
LEFT JOIN relationships ON
  offers.user_id = relationships.related_user_id
LEFT JOIN (
  SELECT array_agg(offers.id) AS ids
  FROM offers
  JOIN user_stock ON
    user_stock.user_id = 487834568 AND
    offers.user_id = user_stock.user_id AND
    offers.product_id = user_stock.product_id
  GROUP BY offers.user_id
  HAVING count(offers.id) <= 3
) AS offers_limit ON
  offers.id = any(offers_limit.ids)
WHERE
  offers.state = 'OPEN' AND
  offers.user_id != 487834568 AND
  offers.group BETWEEN 1 AND 3
ORDER BY
  offers.created_at,
  relationships.trade_count DESC,
  user_stock.amount NULLS FIRST