Left outer join on aggregate queries

3.9k views Asked by At

So I have two payment tables that I want to compare in a Oracle SQL DB. I want to compare the the total payments using the location and invoice and total payments. It's more comlex then this but basically it is:

select
  tbl1.location,
  tbl1.invoice,
  Sum(tbl1.payments),
  Sum(tbl2.payments)    
From 
  tbl1    
  left outer join tbl2 on 
    tbl1.location = tbl2.location 
    and tbl1.invoice = tbl2.invoice    
group by 
  (tbl1.location,tbl1.invoice)

I want the left outer join because in addition to comparing payment amounts, I want see check all orders in tbl1 that may not exist in tbl2.

The issue is that there is that there is multiple records for each order (location & invoice) in both tables (not the same number of records necessarily ie 2 in tbl1 to 1 in tbl2 or vice versa) but the total payments for each order (location & invoice) should match. So just doing a direct join gives me a cartesian product.

So I am thinking I could do two queries, first aggregating the total payments by store & invoice for each and then do a join on those results because in the aggregate results, I would only have one record for each order (store & invoice). But I don't know how to do this. I've tried several subqueries but can't seem the shake the cartesian product. I'd like to be able to do this in one query as opposed to creating tables and joining on those as this will be ongoing.

Thanks in advance for any help.

3

There are 3 answers

1
Jorge Campos On BEST ANSWER

You can use the With statement to create the two querys and join then as you said. I will put just the sintaxe and if you need more help just ask. Thats because you didn't provide full details on your tables. So I will just guess on my answer.

WITH tmpTableA as ( 
        select
          tbl1.location,
          tbl1.invoice,
          Sum(tbl1.payments) totalTblA
        From 
          tbl1
        group by 
          tbl1.location,
          tbl1.invoice
         ),
   tmpTableB as ( 
        select
          tbl2.location,
          tbl2.invoice,
          Sum(tbl2.payments) totalTblB
        From 
          tbl2
        group by 
          tbl2.location,
          tbl2.invoice
         )
Select tmpTableA.location,  tmpTableA.invoice, tmpTableA.totalTblA,
       tmpTableB.location,  tmpTableB.invoice, tmpTableB.totalTblB
  from tmpTableA, tmpTableB
 where tmpTableA.location = tmpTableB.location (+)
   and tmpTableA.invoice = tmpTableB.invoice (+)

The (+) operator is the left join operator for Oracle Database (Of course, you can use the LEFT JOIN statements if you prefer )

1
Guntram Blohm On

Sorry, my first answer was wrong. Thank you for providing the sqlfiddle, MT0.

The point that i missed is that you need to sum up the payments on each table first, so there's only one line left in each, then join them. This is what MT0 does in his statements.

If you want a solution that looks more "symmetric", try:

select A.location, A.invoice, B.total sum1, C.total sum2
from (select distinct location, invoice from tbl1) A
left outer join (select location, invoice, sum(payments) as total from tbl1 group by location, invoice) B on A.location=B.location and A.invoice=B.invoice
left outer join (select location, invoice, sum(payments) as total from tbl2 group by location, invoice) C on A.location=C.location and A.invoice=C.invoice

which results in

LOCATION    INVOICE SUM1    SUM2
a           2       3       2
a           1       5       3
b           1       1       5
b           2       1       (null)
3
MT0 On

Two other options:

SQL Fiddle

Oracle 11g R2 Schema Setup:

CREATE TABLE tbl1 ( id, location, invoice, payments ) AS 
          SELECT  1, 'a', 1, 1 FROM DUAL
UNION ALL SELECT  2, 'a', 1, 1 FROM DUAL
UNION ALL SELECT  3, 'a', 1, 1 FROM DUAL
UNION ALL SELECT  4, 'a', 1, 1 FROM DUAL
UNION ALL SELECT  5, 'a', 1, 1 FROM DUAL
UNION ALL SELECT  6, 'a', 2, 1 FROM DUAL
UNION ALL SELECT  7, 'a', 2, 1 FROM DUAL
UNION ALL SELECT  8, 'a', 2, 1 FROM DUAL
UNION ALL SELECT  9, 'b', 1, 1 FROM DUAL
UNION ALL SELECT 10, 'b', 2, 1 FROM DUAL;

CREATE TABLE tbl2 ( id, location, invoice, payments ) AS 
          SELECT  1, 'a', 1, 1 FROM DUAL
UNION ALL SELECT  2, 'a', 1, 1 FROM DUAL
UNION ALL SELECT  3, 'a', 1, 1 FROM DUAL
UNION ALL SELECT  4, 'a', 2, 1 FROM DUAL
UNION ALL SELECT  5, 'a', 2, 1 FROM DUAL
UNION ALL SELECT  6, 'b', 1, 1 FROM DUAL
UNION ALL SELECT  7, 'b', 1, 1 FROM DUAL
UNION ALL SELECT  8, 'b', 1, 1 FROM DUAL
UNION ALL SELECT  9, 'b', 1, 1 FROM DUAL
UNION ALL SELECT 10, 'b', 1, 1 FROM DUAL;

Query 1:

This one uses a correlated sub-query to calculate the total for the second table:

SELECT location,
       invoice,
       SUM( payments ) AS total_payments_1,
       COALESCE( (SELECT SUM( payments )
                  FROM   tbl2 i
                  WHERE  o.location = i.location
                     AND o.invoice  = i.invoice),
                 0 ) AS total_payments_2
FROM   tbl1 o
GROUP BY
       location,
       invoice
ORDER BY
       location,
       invoice

Results:

| LOCATION | INVOICE | TOTAL_PAYMENTS_1 | TOTAL_PAYMENTS_2 |
|----------|---------|------------------|------------------|
|        a |       1 |                5 |                3 |
|        a |       2 |                3 |                2 |
|        b |       1 |                1 |                5 |
|        b |       2 |                1 |                0 |

Query 2:

This one uses a named sub-query to pre-calculate the totals for table 1 then performs a LEFT OUTER JOIN with the second table and includes the total for table 1 in the group.

Without any indexes then, from the explain plans, Query 1 seems to be much more efficient but your indexes might mean the optimizer finds a better plan.

WITH tbl1_sums AS (
  SELECT location,
         invoice,
         SUM( payments ) AS total_payments_1
  FROM   tbl1
  GROUP BY
         location,
         invoice
)
SELECT t1.location,
       t1.invoice,
       t1.total_payments_1,
       COALESCE( SUM( t2.payments ), 0 ) AS total_payments_2
FROM   tbl1_sums t1
       LEFT OUTER JOIN 
       tbl2 t2
       ON (    t1.location = t2.location
           AND t1.invoice = t2.invoice)
GROUP BY
       t1.location,
       t1.invoice,
       t1.total_payments_1
ORDER BY
       t1.location,
       t1.invoice

Results:

| LOCATION | INVOICE | TOTAL_PAYMENTS_1 | TOTAL_PAYMENTS_2 |
|----------|---------|------------------|------------------|
|        a |       1 |                5 |                3 |
|        a |       2 |                3 |                2 |
|        b |       1 |                1 |                5 |
|        b |       2 |                1 |                0 |