I need to join two tables by approximate date in such a way that every row from table old
gets matched to only the closest date in table new
- one-to-one. No copies of new
rows are allowed - only match once for the smallest difference.
Here is some example to try:
CREATE TABLE `new` (`ID` int(2), `date` datetime, `new` varchar(1));
INSERT INTO `new` (`ID`, `date`, `new`) VALUES
(1, '2016-03-02 12:20:00', 't'),
(1, '2016-03-07 12:20:00', 'u'),
(1, '2016-04-02 12:20:00', 'v'),
(2, '2016-04-12 11:03:00', 'x');
CREATE TABLE `old` (`ID` int(2), `date` datetime, `old` varchar(1));
INSERT INTO `old` (`ID`, `date`, `old`) VALUES
(1, '2016-03-07 12:20:00', 'a'),
(1, '2016-04-02 12:20:00', 'b'),
(1, '2016-03-01 10:09:00', 'c'),
(1, '2015-04-12 10:09:00', 'd'),
(1, '2016-03-03 12:20:00', 'e');
The output I'm expecting is this:
ID old.date old new.date new
1 2016-03-07 12:20:00 a 2016-03-07 12:20:00 u
1 2016-04-02 12:20:00 b 2016-04-02 12:20:00 v
1 2016-03-01 10:09:00 c NULL NULL
1 2015-04-12 10:09:00 d NULL NULL
1 2016-03-03 12:20:00 e 2016-03-02 12:20:00 t
2 NULL NULL 2016-04-12 11:03:00 x
I was able to get somewhat close to this with:
SELECT * FROM old A LEFT OUTER JOIN new B ON A.ID=B.ID AND ABS(TIMESTAMPDIFF(day, A.date, B.date))<2
UNION
SELECT * FROM old A RIGHT OUTER JOIN new B ON A.ID=B.ID AND ABS(TIMESTAMPDIFF(day, A.date, B.date))<2
ORDER BY old
but clearly this ends up matching multiple rows within the specified time window instead of only the best match. Playing with number of days is not a solution for me, as in practice I have two huge tables to join and this needs to work with a time window in which there would be multiple matches in many rows.
It seems the only way to achieve approximate 1-to-1 join is by using a
cursor
within the stored procedure.Thank you @Strawberry for pointing me in the right direction - you will see pieces of your code reused below. Here is the solution that eventually worked for me. It outputs records sorted differently, but at least it is truly 1-to-1 match.
And the output is (using data from the above example, with PRIMARY key set to ID+date):