I hope I've put the query together correctly below. I've simplified from something much, much bigger. But my question should be clear enough from the example.
In the innermost query, nest1
, let's say the results come back with three entries for invoice 123. The invoice was created (1), then two cash deposits were made (2, 3), the second of which (3), paid off the invoice. From that result set, I'm aggregating the sum. I'm also getting the paid_off_date, which only one of the rows will have.
I'm happy to be schooled on better ways to design this query, but what I don't understand is why I have to select the aggregates all the way out. Why, for example, when I select SUM(cash_deposit) AS 'cash_deposit'
in nest2
, do I also have to select SUM(cash_deposit)
in the outermost query? Once it is aggregated in a subquery, why isn't the aggregate passed along? And why do I have to keep grouping by id at each level?
SELECT id, SUM(cash_deposit), MAX(paid_off_date), MAX(job_phase), MAX(cust_id)
FROM
(
SELECT id AS id, SUM(cash_deposit) AS 'cash_deposit', MAX(paid_off_date) AS 'paid_off_date', MAX(job_phase) AS 'job_phase'
FROM
(
SELECT id, cash_deposit, paid_off_date
FROM invoice
GROUP BY id
) nest1
JOIN job j
ON nest1.id = j.id
GROUP BY id
) nest2
JOIN customer c
ON c.invoice = nest2.id
GROUP BY id
Clarification
Thanks for any posts with improved versions of the query. I'm working with an existing query that produces unexpected results when I start trying to turn the nesting into joins. I'm going to keep playing with it, but I'd also really appreciate if someone could answer the question about aggregates and why they have to be repeated when nesting. In C#, I can sum up three numbers in one function and then pass the resulting value back. I don't have to keep summing the results. I need some help understanding how SQL subqueries function differently in that respect.
Possible Answer
One of my coworkers pointed out that in this case, It's because of my GROUP BY
clauses. Since cash_deposit
and job_phase
, for example, aren't in the GROUP BY
clause, they have to be aggregated.
It's forcing you to re-run the aggregates, because you are putting group by in each step of the query. If you remove
group by id
from all but the innermost query, you won't have to re-do the aggregation.Try writing it like this:
You could also do the two joins in one step now with the same exact result set, but I wanted to show you the minimum amount of changes necessary.