I'm seeking some advice on handling scenarios where the number of rows in a result set explodes when you LEFT JOIN multiple tables. I understand this is expected behavior, but I’d like to know the recommended way to handle it. I know I can have my application perform multiple queries to reduce the number of rows returned overall, but I am trying to see if there is a way to have SQL do most of the heavy lifting and still make only one “round trip” (e.g., a la this answer).
Example
Here’s my SQL:
SELECT al.title albumTitle
, releaseDate
, name artistName
, duration
, t.title trackTitle
, styleName
FROM album al
LEFT JOIN lu_albumartist aa ON aa.albumId = al.albumId
LEFT JOIN artist ar ON ar.artistId = aa.artistId
LEFT JOIN track t ON t.albumId = al.albumId
LEFT JOIN lu_albumstyle ast ON ast.albumId = al.albumId
LEFT JOIN style s ON s.styleId = ast.styleId
WHERE al.title LIKE '%A Love Supreme%'
This SQL Fiddle helps demonstrate the problem:
I am retrieving information on a music album. I really only need 11 rows to be able to have my app populate all fields (1 album, 4 artists, 3 tracks, 3 styles), but the query pulls back 36 rows. I won't be able to use most rows; for example, I don’t care about all the permutations of styles & artists or styles & tracks. When I add even more LEFT JOINS for other things (e.g., instruments, formats, comments, track play info, etc.) the number of permutations can expand into the 10,000s!
Ideally, what I'd like is a result set that’s more concise:
Super Compact: 4 rows (doesn't make much sense to read the table this way, though the application could parse it)
| title | releaseDate | name | duration | title | styleName |
|----------------|----------------------------|----------------|-------------|----------------------------------------|-------------|
| A Love Supreme | 1965 | John Coltrane | 479 | Part I - Acknowledgement | Free Jazz |
| [something] | [something] | McCoy Tyner | 435 | Part II - Resolution | Hard Bop |
| [something] | [something] | Jimmy Garrison | 1060 | Part III - Pursuance / Part IV - Psalm | Modal |
| [something] | [something] | Elvin Jones | [something] | [something] | [something] |
..."[something]" just means the value can be anything; the app won't care
Compact: 11 rows (makes sense to humans)
| title | releaseDate | name | duration | title | styleName |
|----------------|----------------------------|----------------|----------|----------------------------------------|-----------|
| A Love Supreme | 1965 | | | | |
| | | John Coltrane | | | |
| | | McCoy Tyner | | | |
| | | Jimmy Garrison | | | |
| | | Elvin Jones | | | |
| | | | 479 | Part I - Acknowledgement | |
| | | | 435 | Part II - Resolution | |
| | | | 1060 | Part III - Pursuance / Part IV - Psalm | |
| | | | | | Free Jazz |
| | | | | | Hard Bop |
| | | | | | Modal |
I'm less concerned about performance than my app's code portability, readability, & scalability.
Based on that earlier linked question, the other answers and comments suggested I shouldn’t attempt to join more than 3 tables.
- Should I just give up on this approach? If I were to use multiple queries, there could be a dozen or more.
- I suspect there is a way to solve this with UNION ALLs, but is that the best practice?
- I'm guessing this is a relatively common problem, but I couldn't find a good answer or a set of guidelines to follow. What’s the recommended approach for this situation?
I can just tell you how I would do it in a first approach.
I would use multiple queries, but not for each thing one query but one query for multiple related things. Like you already did in your query. But I would probably remove the tracks and use then this query for the album search (like in discogs and as well as on the other mentioned sites).
If the user then selects an album, I would load in one query multiple other related things. Like tracks with track play info in one query, in another query formats and instruments and so on. So I would keep related things together, actually like I would group it on the ui.
As always, I would keep it as simple as possible. And think about how much info the user needs on one page. Some things can be loaded lazy, if the user actively wants to see it.
The suggestions not to join more than 3 tables is to general I think. Think about one to one or many to one relations, where at most one record is returned and the number of records is not getting bigger.