6 top min average elements postgresql

74 views Asked by At

I need to count average price and group them by 2 columns. Then select top 2 values (PostgreSQL 10.1). E.g, I have the following structures:

------------------------------------------------------------------------------------------
        category        | shop_name |     price |      date     |
MSI GeForce RTX 2080    |amazon     |   62649   |   1/6/2019    |   
MSI GeForce RTX 2080    |amazon     |   58668   |   1/17/2019   |   
MSI GeForce RTX 2080    |amazon     |   62649   |   1/7/2019    |   
MSI GeForce RTX 2080    |amazon     |   60542   |   1/16/2019   |   
MSI GeForce RTX 2080    |amazon     |   62649   |   1/5/2019    |   
MSI GeForce RTX 2080    |brandstar  |   66456   |   1/16/2019   |   
MSI GeForce RTX 2080    |brandstar  |   66347   |   1/17/2019   |   
MSI GeForce RTX 2080    |brandstar  |   66456   |   1/16/2019   |   
MSI GeForce RTX 2080    |brigo      |   63300   |   1/17/2019   |   
MSI GeForce RTX 2080    |brigo      |   65330   |   1/16/2019   |   
MSI GeForce RTX 2080    |brigo      |   65330   |   1/16/2019   |
MSI GeForce RTX 2070    | fake_shop |   65330   |   1/16/2019   |
MSI GeForce RTX 2070    | fake_shop |   65330   |   1/17/2019   |
MSI GeForce RTX 2070    | fake_shop |   65330   |   1/18/2019   |

Suppose I want to select top 2 average results for category and shop_name. So I expect the following result:


        category        | shop_name |     price |      date     |     avg   |
MSI GeForce RTX 2080    |amazon     |   62649   |   1/6/2019    |   61431.4 |1
MSI GeForce RTX 2080    |amazon     |   58668   |   1/17/2019   |   61431.4 |1  
MSI GeForce RTX 2080    |amazon     |   62649   |   1/7/2019    |   61431.4 |1  
MSI GeForce RTX 2080    |amazon     |   60542   |   1/16/2019   |   61431.4 |1  
MSI GeForce RTX 2080    |amazon     |   62649   |   1/5/2019    |   61431.4 |1  
MSI GeForce RTX 2080    |brandstar  |   66456   |   1/16/2019   |   66419.66667 |  3
MSI GeForce RTX 2080    |brandstar  |   66347   |   1/17/2019   |   66419.66667 |  3
MSI GeForce RTX 2080    |brandstar  |   66456   |   1/16/2019   |   66419.66667 |  3
MSI GeForce RTX 2080    |brigo      |   63300   |   1/17/2019   |   64653.33333 |  2
MSI GeForce RTX 2080    |brigo      |   65330   |   1/16/2019   |   64653.33333 |  2
MSI GeForce RTX 2080    |brigo      |   65330   |   1/16/2019   |   64653.33333 |  2
MSI GeForce RTX 2070    | fake_shop |   65330   |   1/16/2019   |   65330   | 1
MSI GeForce RTX 2070    | fake_shop |   65330   |   1/17/2019   |   65330   | 1
MSI GeForce RTX 2070    | fake_shop |   65330   |   1/18/2019   |   65330   | 1

Then i'd like to choose rows where rank is less than 3.

But I get the following result:

    ---------------------------------------------------------------------------------------------
    MSI GeForce RTX 2080    |amazon     |   62649   |   1/6/2019    |   61431.4 |   1   |
    MSI GeForce RTX 2080    |amazon     |   58668   |   1/17/2019   |   61431.4 |   1   |
    MSI GeForce RTX 2080    |amazon     |   62649   |   1/7/2019    |   61431.4 |   1   |
    MSI GeForce RTX 2080    |amazon     |   60542   |   1/16/2019   |   61431.4 |   1   |
    MSI GeForce RTX 2080    |amazon     |   62649   |   1/5/2019    |   61431.4 |   1   |
    MSI GeForce RTX 2080    |brandstar  |   66456   |   1/16/2019   |   66419.66667 |   1   |
    MSI GeForce RTX 2080    |brandstar  |   66347   |   1/17/2019   |   66419.66667 |   1   |
    MSI GeForce RTX 2080    |brandstar  |   66456   |   1/16/2019   |   66419.66667 |   1   |
    MSI GeForce RTX 2080    |brigo      |   63300   |   1/17/2019   |   64653.33333 |   1   |
    MSI GeForce RTX 2080    |brigo      |   65330   |   1/16/2019   |   64653.33333 |   1   |
    MSI GeForce RTX 2080    |brigo      |   65330   |   1/16/2019   |   64653.33333 |   1   |
    MSI GeForce RTX 2070    | fake_shop |   65330   |   1/16/2019   |   65330   | 1
    MSI GeForce RTX 2070    | fake_shop |   65330   |   1/17/2019   |   65330   | 1
    MSI GeForce RTX 2070    | fake_shop |   65330   |   1/18/2019   |   65330   | 1

Here is my SQL query:

SELECT tt.category,
       tt.shop_name,
       tt.price,
       tt.updated,
       tt.avg_price,
       rank() OVER (PARTITION BY tt.category,
                                 tt.shop_name,
                                 tt.avg_price
                    ORDER BY tt.avg_price DESC)
FROM
  ( SELECT category,
           LOWER(shop_name) AS shop_name,
           CAST (price AS INTEGER) AS price,
                DATE(updated) AS updated,
                avg(price) OVER (PARTITION BY category,
                                              LOWER(shop_name)) AS avg_price
   FROM prices ) AS tt
2

There are 2 answers

3
Salman A On BEST ANSWER

Just use AVG() OVER () followed by DENSE_RANK():

WITH cte1 AS (
    SELECT *, AVG(price) OVER (PARTITION BY category, shop_name) AS avg_price
    FROM prices
), cte2 AS (
    SELECT *, DENSE_RANK() OVER (PARTITION BY category ORDER BY avg_price) AS rnk
    FROM cte1
)
SELECT *
FROM cte2
WHERE rnk <= 2
ORDER BY category, shop_name
2
Gordon Linoff On

I think you want:

select tt.category,  tt.shop_name, tt.price,  tt.updated, tt.avg_price,
       dense_rank() over (partition by tt.category order by tt.avg_price desc)
from (select category, lower(shop_name) as shop_name,
             (price::int) as price, updated::date as updated,
             avg(price) over (partition by category,  lower(shop_name)) as avg_price 
      from prices
     ) tt

I simplified some of the logic, but the main change is the partition by for rank(). You seem to want the ranking for each store. dense_rank() is also more appropriate.

If you want to differentiate categories with the same overage price:

       dense_rank() over (partition by tt.shop_name order by tt.avg_price desc, category)