How can I show the counts of each group in my query as percentages?

185 views Asked by At

I have a column in mytable(Movies), Actors. I am trying to get percentage of times each actor shows up.

CREATE TABLE movies AS SELECT * FROM ( VALUES
  ('Robert DeSouza'),
  ('Tony Wagner'),
  ('Sean Cortese'),
  ('Robert DeSouza'),
  ('Robert DeSouza'),
  ('Tony Wagner'),
  ('Sean Cortese'),
  ('Charles Bastian'),
  ('Robert DeSouza')
) AS t(actors);

The result I am requesting:

select Actors, (some formula * 100) as "The Ratio" from Movies

Actors                       The Ratio
Robert DeSouza                 44%
Tony Wagner                    22%
Sean Cortese                   22%
Charles Bastian                11%
                               100%
4

There are 4 answers

2
Evan Carroll On BEST ANSWER
SELECT actors, floor(count(*) *100 / sum(count(*)) OVER ())
FROM movies
GROUP BY actors
ORDER BY count(*) DESC;

This gets you most of the way there..

     actors      | floor 
-----------------+-------
 Robert DeSouza  |    44
 Tony Wagner     |    22
 Sean Cortese    |    22
 Charles Bastian |    11

Not sure how you're getting 100 in your example. You want the floor of the percentages, and you want it to magically say 100? If 44+22+22+11 = 100 today then it's just one of those days. But we can do that too.

SELECT actors AS "Actors", r::text || '%' AS "The Ratio"
FROM (
  SELECT
    actors AS "Actors",
    floor(count(*) *100 / sum(count(*)) OVER ()) AS r,
    false AS is_total
  FROM movies
  GROUP BY actors
  UNION ALL
    SELECT *
    FROM ( VALUES
      (null, 100, true)
    ) AS t(actors, floor, is_total)
  ORDER BY 3, 2 DESC
) AS t(actors,r);

Output,

     Actors      | The Ratio 
-----------------+-----------
 Robert DeSouza  | 44%
 Tony Wagner     | 22%
 Sean Cortese    | 22%
 Charles Bastian | 11%
                 | 100%

If you don't want to floor you can round()

1
Air On

There is no numeric type that includes a percent sign (the % character) so your problem can't be solved solely by an expression that calculates the numeric value. In addition to calculating that value, you need to format it as text using the to_char() function.

This function takes a numeric value and converts it to a text value using a formatting literal that you provide as the second argument. In this case it looks like what you want to do is to round to the nearest percent and show the percent sign. You probably want to use '990%' as your formatting literal. Adding this to your example table and the window function that Gordon suggested yields:

[local] air@postgres=> CREATE TABLE movies AS SELECT * FROM ( VALUES
...   ('Robert DeSouza'),
...   ('Tony Wagner'),
...   ('Sean Cortese'),
...   ('Robert DeSouza'),
...   ('Robert DeSouza'),
...   ('Tony Wagner'),
...   ('Sean Cortese'),
...   ('Charles Bastian'),
...   ('Robert DeSouza')
... ) AS t(actors);

SELECT 9
Time: 715.613 ms
[local] air@postgres=> select actors, to_char(100 * count(*) / sum(count(*)) over (), '990%') as "The Ratio" from movies group by actors;
┌─────────────────┬───────────┐
│     actors      │ The Ratio │
├─────────────────┼───────────┤
│ Charles Bastian │   11%     │
│ Tony Wagner     │   22%     │
│ Sean Cortese    │   22%     │
│ Robert DeSouza  │   44%     │
└─────────────────┴───────────┘

(4 rows)

Time: 31.501 ms

You want to make sure you account for the need to display a full range of possible values, including 100% and 0%; since to_char() will round to fit your desired precision, it is possible for an actor to show zero as their ratio, despite existing in the table:

[local] air@postgres=> delete from movies where actors <> 'Tony Wagner';
DELETE 7
Time: 36.697 ms
[local] ahuth@postgres=> insert into movies (actors) select 'Not Tony Wagner' from generate_series(1,500);
INSERT 0 500
Time: 149.022 ms
[local] ahuth@postgres=> select actors, to_char(100 * count(*) / sum(count(*)) over (), '990%') as "The Ratio" from movies group by actors;
┌─────────────────┬───────────┐
│     actors      │ The Ratio │
├─────────────────┼───────────┤
│ Tony Wagner     │    0%     │
│ Not Tony Wagner │  100%     │
└─────────────────┴───────────┘
(2 rows)

Time: 0.776 ms

If you want to expand this to show decimal places, just modify the format string. Use 0 in your formatting literal when you want to force leading or trailing zeros.

3
Keith John Hutchison On

This works by joining the grouped results with the overall results using union.

EDIT: Dropped concatenation and a 0 after comments from @Air.

select actors actor, ratio from ( 
    select 0 sort
    , actors
    , round(count(*) * 100.0 / ( select count(*) 
    from movies ),0) ratio 
    from movies 
    group by actors 
    union 
    select 1 sort
    , 'Total'
    , round(count(*) / count(*) * 100,0) ratio 
    from movies 
) actors 
order by sort, actors ; 

-- results

      actor      | ratio 
-----------------+-------
 Charles Bastian | 11
 Robert DeSouza  | 44
 Sean Cortese    | 22
 Tony Wagner     | 22
 Total           | 100
(5 rows)
4
Gordon Linoff On

You can do this using window functions. For the numeric calculation:

select m.actor,
       count(*) * 1.0 / sum(count(*)) over () as ratio
from movies m
group by m.actor;

You can convert the ratio to whatever format you want -- multiply by 100 to get a percentage, use string concatenation to add a percent. To me, something called a ratio should be between 0 and 1 (in this case).