SQL BQ Return user's best and worst rated movie

119 views Asked by At

I have data in format:

user_id user_name movie_rating movie_name
1 x 1 asd1
1 x 3 asd2
1 x 5 asd3
1 x 5 asd4
2 y 2 asd4
2 y 3 asd5
2 y 4 asd6

I want to extract the best and the worst rated movie per user in a format where best_rated_movie and worst_rated_movie is array (because user can have multiple worst movies etc):

user_id user_name best_rated_movie worst_rated_movie
1 x (asd3,asd4) asd1
2 y asd6 asd4

I managed to extract table to output where I have only the best and the worst rated movies:

user user_name movie_rating movie_name
1 x 1 asd1
1 x 5 asd3
1 x 5 asd5
2 y 2 asd4
2 y 4 asd6

Here is a code which I used for that:

with best_movie_rating as (
  select
    user_id,
    max(movie_rating) as max_rating
  from source_table
  group by user_id
),

worst_movie_rating as (
  select
    user_id,
    min(movie_rating) as min_rating
  from source_table
  group by user_id
),
columns_final as (
  select
    t1.user_id,
    t1.user_name,
    t1.company_name,
    t1.movie_rating,
    t1.movie_name 
  from source_table t1
  inner join best_movie_rating t2
    on t1.user_id = t2.user_id 
  inner join worst_movie_rating t3
    on t1.user_id = t3.user_id 
  where (t1.movie_rating= t2.max_rating and t1.user_id = t2.user_id)
    or (t1.movie_rating= t3.min_rating and t1.user_id = t3.user_id)
)
select * from columns_final   

Unfortunately I don't know how to move forward, I've tried aggregating with no success (especially with arrays in play). I'd appreciate at least a suggestion how to deal with this.

2

There are 2 answers

0
Mikhail Berlyant On BEST ANSWER

I want to extract the best and the worst rated movie per user in a format where best_rated_movie and worst_rated_movie is array (because user can have multiple worst movies etc)

Consider below

select user_id, user_name,
  array_concat_agg(movies order by movie_rating desc limit 1) best_rated_movie,
  array_concat_agg(movies order by movie_rating limit 1) worst_rated_movie,
from (
  select user_id, user_name, movie_rating, array_agg(movie_name)  movies, 
  from `project.dataset.table` t
  group by user_id, user_name, movie_rating
)
group by user_id, user_name    

If applied to sample data in your question - output is

enter image description here

If (as other answer imply) you want result as a string of comma separated movies - you can use below

select user_id, user_name,
  string_agg(movies order by movie_rating desc limit 1) best_rated_movie,
  string_agg(movies order by movie_rating limit 1) worst_rated_movie,
from (
  select user_id, user_name, movie_rating, string_agg(movie_name)  movies, 
  from `project.dataset.table` t
  group by user_id, user_name, movie_rating
)
group by user_id, user_name

with output

enter image description here

0
Kazi Mohammad Ali Nur Romel On

You can use string_agg() like below:

with columns_final as (
  select
    t1.user_id,
    t1.user_name,    
    t1.movie_rating,
    t1.movie_name ,
    max (movie_rating)over (partition by user_id ) max_rating,
    min (movie_rating)over (partition by user_id ) min_rating
  from source_table t1  
)

select user_id,user_name,string_agg(case when movie_rating=max_rating then movie_name end,',')best_rated_movie,
string_agg(case when movie_rating=min_rating then movie_name end,',')worst_rated_movie
from columns_final   
where movie_rating =max_rating or movie_rating=min_rating
group by user_id,user_name

I have also made some changes in your query to make it compact. Instead of two common table expression I have used window function to calculate max and min movie_rating.