SQL Join returning duplicate rows

6.4k views Asked by At

I'm struggling with how to phrase my problem, sorry for any confusion. I have 3 tables associated with each purchase: Transaction; TransactionEntry, which includes separate rows for each item in the transaction; and TaxEntry, which includes the the taxID and taxAmount for each entry in TransactionEntry.

I want to find the total, in a given month, for each TaxAmount AND price by TaxID. Price and Date are found in Transaction Entry, while TaxAmount and TaxID are found in TaxEntry. In addition, we have multiple taxes per item (state and county).

My tables look something like this:

Transaction Entry:

TransactionNum  Price  Date
1               8.99   2015-01-06     
1               3.65   2015-01-06     
2               3.99   2015-03-06     

TaxEntry:

TransactionNum  TaxID   TaxAmount
1               2       0.89        
1               16      0.09        
1               2       0.37        
1               16      0.04        
2               4       0.40        
2               16      0.04        

I want to find the sum of Price, by TaxID, for a given month. So, for example, I want 8.99+3.65 to be returned for TaxID 2; 3.99 for TaxID 4, and 8.99+3.65+3.99 for TaxID 16. Everything I try returns the incorrect amounts, repeating each number multiple times.

My code is:

SELECT SUM(Price), TaxID
FROM TaxEntry XE
JOIN TransactionEntry TE
ON XE.TransactionNumber = TE.TransactionNumber
WHERE CAST(TE.TransactionTime AS DATE) BETWEEN '06-01-2015' and '06-30-2015' 
GROUP BY XE.TaxID 

My results are in the range of up to three times too large.

I have tried it with joins of various types but it still repeats info. I feel like I should be doing something with DISTINCT but it fails to accomplish anything.

2

There are 2 answers

0
Gordon Linoff On BEST ANSWER

You will get the right result if you pre-aggregate the tax table:

SELECT XE.Price, TE.TaxID
FROM (SELECT TransactionNumber, SUM(Price) as Price
      FROM TaxEntry XE
     ) XE JOIN
     (SELECT TransactionNumber, TaxID
      FROM TransactionNum TE
      GROUP BY TransactionNumber
     ) TE
     ON XE.TransactionNumber = TE.TransactionNumber
WHERE TE.TransactionTime >= '2015-06-01' AND 
      TE.TransactionTime < '2015-07-01' ;

Notice:

  • Each subquery now aggregates by the transaction number, so duplicated rows are not produced.
  • You no longer need the outer group by.
  • The fancy date arithmetic has been replaced by direct comparisons. This allows the use of an index (if available).
  • Replaced your date constants with ISO-standard dates.
0
MT0 On

SQL Fiddle

Oracle 11g R2 Schema Setup:

CREATE TABLE Transactions ( TransactionNum,  Price,  "Date" ) AS
          SELECT 1, 8.99, DATE '2015-06-01' FROM DUAL
UNION ALL SELECT 1, 3.65, DATE '2015-06-01' FROM DUAL
UNION ALL SELECT 2, 3.99, DATE '2015-06-03' FROM DUAL;

CREATE TABLE Taxes ( TransactionNum, TaxID, TaxAmount ) AS
          SELECT 1, 2,       0.89         FROM DUAL
UNION ALL SELECT 1, 16,      0.09         FROM DUAL
UNION ALL SELECT 1, 2,       0.37         FROM DUAL
UNION ALL SELECT 1, 16,      0.04         FROM DUAL
UNION ALL SELECT 2, 4,       0.40         FROM DUAL
UNION ALL SELECT 2, 16,      0.04      FROM DUAL;

Query 1:

SELECT t.TransactionNum,
       t.total_price + COALESCE( x.total_tax, 0 ) AS total_cost
FROM   ( SELECT TransactionNum,
                SUM( Price ) AS total_price
         FROM   Transactions
         WHERE  "Date" BETWEEN DATE '2015-06-01' and DATE '2015-06-30'
         GROUP BY TransactionNum ) t
       LEFT OUTER JOIN
       ( SELECT TransactionNum,
                SUM( TaxAmount ) AS total_tax
         FROM   Taxes
         GROUP BY TransactionNum ) x
       ON ( t.TransactionNum = x.TransactionNum )

Results:

| TRANSACTIONNUM | TOTAL_COST |
|----------------|------------|
|              1 |      14.03 |
|              2 |       4.43 |