How can also show null (or zero) values corresponding to Q1 and Q5 values of column on SQLserver?

84 views Asked by At

I made this exercise on SQL server: write a query that lists for each cluster the quantity of products that fall within it. The company wants to obtain an analysis of sales with respect to the average quantity of each product present in each order, classifying them into six clusters: Q1 (<15), Q2 (15-20), Q3 (21-25), Q4 (26-30), Q5 (31-35), Q6(>35). Write a query that lists, for each product, the product name and the cluster to which it belong. The database is northwind

select count(ProductName) as prod_num ,cluster

from  (
        select  ProductName,
        case 
        when avg(Quantity) < 15 then 'Q1'
        when avg(Quantity) <= 20 then 'Q2'
        when avg(Quantity) between 21 and 25 then 'Q3'
        when avg(Quantity) between 26 and 30 then 'Q4'
        when avg(Quantity) between 31 and 35 then 'Q5'
        else 'Q6'
        end
        as cluster
        from [Order Details] od  join Products pr on od.ProductID=pr.ProductID
        group by  ProductName
        ) as clusters  
group by  cluster
order by cluster
OUTPUT
22  Q2
35  Q3
18  Q4
2   Q6

I also need to display values for Q1 and Q5.

3

There are 3 answers

2
topsail On BEST ANSWER

You can always seed your initial counts, for example:

declare @clusters table (prod_num int, cluster nchar(2));
insert into @clusters values
    (0, 'Q1'),(0, 'Q2'),(0, 'Q3'),(0, 'Q4'),(0, 'Q5'),(0, 'Q6');
    
select 
    t1.cluster,
    t1.prod_num + isnull(t2.prod_num, 0) as prod_num
from 
    @clusters t1
    left join
        (
            select count(ProductName) as prod_num ,cluster
            from  (
                    select  ProductName,
                    case 
                    when avg(Quantity) < 15 then 'Q1'
                    when avg(Quantity) between 15 and 20 then 'Q2'
                    when avg(Quantity) between 21 and 25 then 'Q3'
                    when avg(Quantity) between 26 and 30 then 'Q4'
                    when avg(Quantity) between 31 and 35 then 'Q5'
                    else 'Q6'
                    end
                    as cluster
                    from [Order Details] od  join Products pr on od.ProductID=pr.ProductID
                    group by  ProductName
                    ) as clusters  
            group by  cluster
        ) t2
    on t1.cluster = t2.cluster
order by t1.cluster;

Now we have an initial count of zero for all groups and add to that the counts we found in our query.

Untested so please let me know if you find errors...

3
Stuck at 1337 On

You can make your query much simpler by storing the clusters and the ranges in a table (so that you can reuse it across other similar queries that use the same breakdown). I use a #temp table here but there's no reason this couldn't be a static, permanent dimension table.

CREATE TABLE #clusters(cluster char(2), lo int, hi int,
  INDEX cix_cl CLUSTERED(lo,hi));

INSERT #clusters VALUES('Q1', 0,14),('Q2',15,20),('Q3',21,25),
                       ('Q4',26,30),('Q5',31,35),('Q6',36,2000000000);

SELECT prod_num = COUNT(p.ProductName), cl.cluster
FROM #clusters AS cl
LEFT OUTER JOIN
(
  SELECT pr.ProductName, avgQ = AVG(od.Quantity) 
    FROM dbo.[Order Details] AS od
    INNER JOIN dbo.Products AS pr
    ON od.ProductID = pr.ProductID
    GROUP BY pr.ProductName
) AS p
ON p.avgQ BETWEEN cl.lo AND cl.hi
  GROUP BY cl.cluster;

Working example in this fiddle.

And again, unless ProductName is not unique and you really care about averages across different IDs with the same name as opposed to just product IDs, the join is unnecessary and you could simplify even further:

SELECT prod_num = COUNT(p.ProductID), cl.cluster
FROM #clusters AS cl
LEFT OUTER JOIN
(
  SELECT ProductID, avgQ = AVG(Quantity) 
    FROM dbo.[Order Details]
    GROUP BY ProductID
) AS p
ON p.avgQ BETWEEN cl.lo AND cl.hi
  GROUP BY cl.cluster;
1
Charlieface On

You don't need a temp table or table variable, you can use a virtual VALUES clause to generate all the rows.

You can also significantly simplify this by putting the range numbers into that table also.

select 
    t1.cluster,
    count(t2.AvgQuantity) as prod_num
from (VALUES
  ('Q1', -999999, 15),
  ('Q2', 15, 20),
  ('Q3', 20, 25),
  ('Q4', 25, 30),
  ('Q5', 30, 35),
  ('Q6', 35, 999999)
) t1(cluster, low, hi)
left join (
    select
      ProductName,
      avg(Quantity) as AvgQuantity
    from Products pr
    join [Order Details] od on od.ProductID = pr.ProductID
    group by
      pr.Id,
      pr.ProductName
) t2 on t2.AvgQuantity > t1.low AND t2.AvgQuantity <= t1.hi
group by
  t1.cluster
order by
  t1.cluster;

For efficiency (as well as possibly accuracy), you should also group by the product ID or primary key.

Note that the above query only gets you results for products that were actually sold. To include all products, change the innermost join to a left join