When executed, is there any difference at between the following two sql queries:

SELECT name, count(*) FROM mytable GROUP BY name HAVING count(*) > 1

And:

SELECT * from (SELECT name, count(*) cnt FROM mytable GROUP BY name) x where cnt > 1

In other words, is having more a "convenience" clause to simplify having to do subselect, or does the query engine fundamentally performance different when a having statement is used vs the second approach? Currently in mysql:

enter image description here

Create table:

CREATE TABLE `mytable` (
  `name` varchar(20) NOT NULL DEFAULT ''
) ENGINE=InnoDB DEFAULT CHARSET=utf-8;

1 Answers

1
Gordon Linoff On Best Solutions

In almost any other database, the two would be equivalent. For conciseness, HAVING is usually a better choice.

At least historically, MySQL materialized subqueries. So, this query:

SELECT * 
FROM (SELECT name, count(*) as cnt
      FROM mytable
      GROUP BY name
     ) x 
WHERE cnt > 1;

suggests that it is going to write out the derived table, and then re-scan it for the final WHERE. However, this makes little difference to performance because the GROUP BY is already reading and writing the data.

So, these queries are probably quite similar in performance on MySQL. And, they would have the same execution plan on almost any other database. The HAVING clause results in the simpler query.