SQL Query Filter to locate DUPLICATES in a column based on Values in 2 other columns

128 views Asked by At

I am using SQL Server 2014 and I am working with a table named ReservationStay. It contains the records of all guests with their names, arrival dates and departure dates. An operation has been undertaken that has split the records of hundreds of guests into 2 separate entries, which means that these entries now have the same Guest name but with different arrival dates and departure dates.

An example of an original entry:

 Name         ArrivalDate        DepartureDate
Simon G       2015-06-01          2015-06-08

Here is what happened after that split operation was effected, say, on 2015-06-03:

 Name         ArrivalDate        DepartureDate
Simon G       2015-06-01          2015-06-03
Simon G       2015-06-03          2015-06-08

This split operation was carried out on several days.

I need a filter in my query that will take into account the following:

WHERE Name is a duplicate and the DepartureDate of the first entry = the ArrivalDate of the second entry.

Basically, I want to re-construct the original entry.

How do I write this filter?

3

There are 3 answers

4
Giorgos Betsos On BEST ANSWER

You can use LEAD, LAG window functions in order to locate records that have been split:

SELECT Name, MIN(ArrivalDate) AS ArrivalDate, MAX(DepartureDate) AS DepartureDate
FROM (
SELECT Name, ArrivalDate, DepartureDate, 
       CASE 
          WHEN ArrivalDate = LAG(DepartureDate) OVER (PARTITION BY Name 
                                                      ORDER BY ArrivalDate) 
               OR                                                     

               DepartureDate = LEAD(ArrivalDate) OVER (PARTITION BY Name 
                                                       ORDER BY ArrivalDate)
          THEN 1
          ELSE 0
       END AS HasBeenSplit                                                       
FROM mytable ) t
GROUP BY Name, HasBeenSplit

This query will give you back the original version of your table.

Demo here

0
Carsten Massmann On

It can be done with a simple INNER JOIN ("self-join"):

SELECT a.Name, a.ArrivalDate, b.DepartureDate
FROM dtab a 
INNER JOIN dtab b ON b.Name=a.Name
AND b.ArrivalDate=a.DepartureDate

see here http://sqlfiddle.com/#!9/51ea3/2

I added a few more lines to the table to have an example that will not fulfill the condition:

CREATE TABLE dtab (Name varchar(11),ArrivalDate varchar(10),DepartureDate varchar(10));
INSERT INTO dtab (Name,ArrivalDate,DepartureDate)
VALUES
('Simon G', '2015-06-01', '2015-06-03'),
('Simon G', '2015-06-03', '2015-06-08'),
('Peter M', '2015-03-07', '2015-03-15'),
('Peter M', '2015-05-05', '2015-05-10');

and get the desired result

|    Name | ArrivalDate | DepartureDate |
|---------|-------------|---------------|
| Simon G |  2015-06-01 |    2015-06-08 |

Edit

Just noticed, that in order to reconstruct the unsplit table you will also need to list also the entries that have not been split before. To get that you could do the following:

WITH combined AS (
 SELECT a.Name cnam, a.ArrivalDate carr, b.DepartureDate cdep
 FROM dtab a 
 INNER JOIN dtab b ON b.Name=a.Name
 AND b.ArrivalDate=a.DepartureDate
)
SELECT d.* FROM dtab d
LEFT JOIN combined ON cnam=Name AND (carr=ArrivalDate OR cdep=DepartureDate)
WHERE cdep IS NULL
UNION ALL 
SELECT * FROM combined

I put the original SELECT statement into a common table expression (combined) and used it to check on the original table whether arrival or departure dates of any of thoses entries coincide. If they do, the original entries will not be listed, otherwise they will be listed in UNION with the entries of the combined table.

Now we get

|    Name | ArrivalDate | DepartureDate |
|---------|-------------|---------------|
| Peter M |  2015-03-07 |    2015-03-15 |
| Peter M |  2015-05-05 |    2015-05-10 |
| Simon G |  2015-06-01 |    2015-06-08 |

see here http://sqlfiddle.com/#!6/7d325/5

This solution will work as of SQL Server 2005 (LAG / LEAP were introduced in SQL Server 2012).

2
mohan111 On
    declare @t table (name varchar(10),Arrivaldate varchar(20),Departure varchar(20)) 
    insert into @t(name,Arrivaldate,Departure)
values 
('Simon G','2015-06-01','2015-06-03'),
('Simon G','2015-06-03','2015-06-08')

    Select A.name,A.Arrivaldate,A.Departure from (
    select NAME,MIN(Arrivaldate)Arrivaldate,MAX(Departure)Departure from @t GROUP BY NAME)A