Joining tables where two columns match

60 views Asked by At

I have a small database of various computer parts and prices. There are 2 tables, Parts and Prices.

Parts:

partID      desc        mfgr        timeStamp
----------  ----------  ----------  ----------
1           RAM         Crucial     1
2           MOBO        MSI         1
3           I7 970      Intel       1
1           RAM         Crucial     2

Prices:

productID   qty         price       timeStamp
----------  ----------  ----------  ----------
1           1           50.0        1
1           2           100.0       1
1           3           130.0       1
2           1           140.0       1
3           1           499.99      1
3           1           449.99      2
1           4           150.0       2
2           1           150.0       2
1           1           40.0        2
1           4           200.0       3

I need to grab everything from Parts that has the most recent timestamp, and GROUP_CONCAT(price) for all of the prices that match both the partID and the timestamp. So the output should look something like this:

partID      desc        mfgr        timeStamp     GROUP_CONCAT(price)
----------  ----------  ----------  ----------    -------------------
1           RAM         Crucial     2             150.0,40
2           MOBO        MSI         1             140.0
3           I7 970      Intel       1             499.99

I'm really close, but not quite getting the right results. I have tried

SELECT * FROM Parts INNER JOIN 
    (SELECT partID, MAX(Parts.timeStamp) as maxTS, GROUP_CONCAT(price) FROM
     Parts, Prices WHERE partID = Prices.productID GROUP BY partID) grouped  
ON Parts.partID = grouped.partID AND Parts.timeStamp = grouped.maxTS;

But this grabs everything from the pricing table where the part ID matches, whether or not the timestamp also matches.

partID      desc        mfgr        timeStamp   partID      maxTS       GROUP_CONCAT(price)
----------  ----------  ----------  ----------  ----------  ----------  -------------------
2           MOBO        MSI         1           2           1           140.0,150.0
3           I7 970      Intel       1           3           1           449.99,499.99
1           RAM         Crucial     2           1           2           40.0,50.0,100.0,130

So I changed my command to

SELECT * FROM Parts INNER JOIN
    (SELECT partID, MAX(Parts.timeStamp) AS maxTS, GROUP_CONCAT(price) 
    FROM Parts, Prices 
    WHERE partID = Prices.productID AND (SELECT MAX(parts.timestamp) FROM Parts) = Prices.timeStamp) grouped 
ON Parts.partID = grouped.partID AND Parts.timeStamp = grouped.maxTS;

But this only matches rows from pricing that have the largest timestamp that occurs in parts. (Which is 2)

What am I doing wrong here?

1

There are 1 answers

2
Pawel Veselov On BEST ANSWER

The mistake you are making is that you are grouping price entries before filtering out only the entries that you need. So, your grouped sub-query will contain all the prices for a partID, and there is no way to separate them out later, because they are grouped.

The best approach for solving these kind of things is breaking down your queries into pieces that you need.

You say:

I need to grab everything from Parts that has the most recent timestamp

So, let's do that. Note that this will pretty much require a sub-query, or "pivot", since RDBMS are not that good in letting you pick a row on a condition that depends on other rows (in this case, selecting rows where a field is maximum among some group). We'll call this sub-query aux, and use it to select entries from parts that match the partID/timeStamp combination that satisfies the condition:

select * from parts,
(select partId, max(timeStamp) maxts from parts group by partId) aux
where parts.partId = aux.partId and parts.timeStamp = aux.maxts

This is using implicit joins, you can rewrite this query using JOIN syntax as well. I personally avoid JOIN syntax unless I need left, or other special joins:

select * from parts
join (select partId, max(timeStamp) maxts from parts group by partId) aux
on parts.partId = aux.partId and parts.timeStamp = aux.maxts

Now, you wanna join it with the prices of the same ID/timestamp, but group the prices together (as GROUP_CONCAT). The key here is to only select (which happens before grouping) the price entries that match the "latest" part entries.

Because the first query produces output that can be directly joined with prices tables, the query just needs to be extended to include price table, and grouping:

select parts.partid, parts.desc, group_concat(prices.price) from 
parts, prices, (
  select partId, max(timeStamp) maxts from parts group by partId) aux
where 
  parts.partId = aux.partId and
  parts.timeStamp = aux.maxts and
  prices.productID = parts.partid and 
  prices.timestamp = parts.timestamp
group by parts.partid, parts.desc

This can be re-written using JOIN syntax as well:

select parts.partid, parts.desc, group_concat(prices.price) 
from parts 
join (select partId, max(timeStamp) maxts from parts group by partId) aux
on parts.partId = aux.partId and parts.timeStamp = aux.maxts
join prices on prices.productID = parts.partid and prices.timestamp = parts.timestamp
group by parts.partid, parts.desc

This query is a little bit convoluted, and depending on the data set, it may be beneficial to re-write it in a different way, to make sure database understands (good for the optimizer) what gets filtered first. We can move the filtered "parts" into a sub-query of its own (call it bux), and then join that with prices table:

select bux.partid, bux.desc, group_concat(prices.price) from prices 
join (
  select parts.partId, parts.desc, aux.maxts 
  from parts join
  (select partId, max(timeStamp) maxts from parts group by partId) aux
  on parts.partId = aux.partId and parts.timeStamp = aux.maxts
) bux
on prices.productID = bux.partid and prices.timestamp = bux.maxts
group by bux.partid, bux.desc

If you check the execution plans between the two, you will see a difference. Selecting which one to use on production would depend on which one performs better.

http://sqlfiddle.com/#!9/f12c8/10/0