SQL correlated subquery sequence of events

Question

SQL correlated subquery sequence of events

137 views Asked by Jay Wang At 20 December 2016 at 02:29

This query finds the largest country by area in each continent. It shows the continent, name, and area.

SELECT
  c1.continent,
  c1.name,
  c1.area
FROM
  countries c1
WHERE
  c1.area = (
        SELECT
          MAX(c2.area)
        FROM
          countries c2
        WHERE
          c1.continent = c2.continent
        )

If I'm understanding correctly the subquery will match each entry's continent from c1 to all c2 entries with the same continent and find the max. And if so the subquery will be run for every entry in c1. Doesn't that seem unnecessary?

Can someone please explain the sequence of events that happens in this query and if there is a better way to do this. Thanks.

Original Q&A

There are 2 answers

**Gordon Linoff** · Answer 1 · 2016-12-20T02:33:06+00:00

You need to understand that SQL queries are non-procedural: They describe the result set, not how to create it.

If an index is available on countries(continent, area), then the "subquery" is simply a look up in an index -- a very efficient operation.

If an index is not available, then some database engines are smart enough to implement the subquery as a type of join operation -- typically using a hash join if the data is large enough.

Admittedly, some databases would run the subquery for each row in the outer query, essentially doing a nested loop join. But there is no requirement that the engine do this.

**AudioBubble** · Answer 2 · 2016-12-20T08:56:19+00:00

Gordon already answered the part about how this is processed.

If you want to know if Postgres re-writes the co-related subquery to a join or not - check the execution plan using explain (analyze) ....

and if there is a better way to do this.

Yes there is. With standard SQL this is typically much faster when using window functions:

select continent, name, area
from (
  select continent, name, area, 
         dense_rank() over (partition by continent order by area desc) as rnk
  from countries
) t
where rnk = 1;

With Postgres using distinct on () is typically faster then the solution using window functions:

select distinct on (continent) * 
from countries 
order by continent, area desc

TechQA.

SQL correlated subquery sequence of events

There are 2 answers

Related Questions in SQL

Related Questions in POSTGRESQL

Related Questions in GREATEST-N-PER-GROUP

Related Questions in CORRELATED-SUBQUERY

Popular Questions

Popular Tags

Trending Questions