I'm working on a SQL homework problem:

"For each movie that has at least one rating, find the movie title and total number of stars, the highest star and the person who gave highest star."

Database:

create table Movies(mID integer, title varchar(100));
create table Reviewers(rID integer, name varchar(100)); 
create table Ratings(rID integer, mID integer, stars integer);

insert into Movies values(101, 'Gone with the Wind'); 
insert into Movies values(102, 'Star Wars'); 
insert into Movies values(103, 'The Sound of Music');

insert into Reviewers values(201, 'Sarah Martinez'); 
insert into Reviewers values(202, 'Daniel Lewis'); 
insert into Reviewers values(203, 'Brittany Harris'); 

insert into Ratings values(201, 101, 2); 
insert into Ratings values(203, 101, 4); 
insert into Ratings values(203, 102, 4);
insert into Ratings values(203, 103, 4);
insert into Ratings values(202, 103, 2);

Best query I can come up with is:

SELECT title,
SUM(stars) AS total_stars,
MAX(stars) AS highest_stars,
name AS highest_stars_reviewer
FROM Movies
INNER JOIN Ratings USING(mID)
INNER JOIN Reviewers USING(rID)
GROUP BY mID;

The problem is that instead of returning the name of the reviewer who gave the highest stars review, the query returns the reviewer with the lower rID who reviewed the movie.

I would appreciate any help with this query to get the desired result.

1 Answers

1
Gordon Linoff On

The proper way to do this in SQL uses window functions:

SELECT m.title,
       SUM(r.stars) AS total_stars,
       MAX(r.stars) AS highest_stars,
       MAX(CASE WHEN r.seqnum = 1 THEN rv.name END) AS highest_stars_reviewer
FROM Movies m INNER JOIN
     (SELECT r.*,
             ROW_NUMBER() OVER (PARTITION BY m.id ORDER BY r.stars DESC) as seqnum
      FROM Ratings r
     ) r
     USING (mID) INNER JOIN
     Reviewers rv
     USING (rID)
GROUP BY m.title, m.mID;

Notes:

  • If you are using multiple tables in a query, you should be qualifying all column names.
  • The GROUP BY columns should match the SELECT columns -- although your version is okay because SQL allows you to do this when the aggregation key is a unique key.
  • This returns an arbitrary reviewer with the highest stars, in the event that there is more than one review with the maximum.