T-SQL Query to remove duplicate records in the output based on one particular column

609 views Asked by At

I am running SQL Server 2014 and I have the following T-SQL query:

 USE MYDATABASE

 SELECT * 
 FROM RESERVATIONLIST
 WHERE [MTH] IN ('JANUARY 2015','FEBRUARY 2015')

RESERVATIONLIST mentioned in the code above is a view. The query gives me the following output (extract):

ID     NAME       DOA         DOD       Nights Spent       MTH
--------------------------------------------------------------------
251    AH     2015-01-12   2015-01-15      3           JANUARY 2015
258    JV     2015-01-28   2015-02-03      4           JANUARY 2015
258    JV     2015-01-28   2015-02-03      2           FEBRUARY 2015

The above output consist of around 12,000 records.

I need to modify my query so that it eliminates all duplicate ID and give me the following results:

 ID    NAME       DOA          DOD       Nights Spent       MTH
 --------------------------------------------------------------------
 251     AH     2015-01-12   2015-01-15      3           JANUARY 2015
 258     JV     2015-01-28   2015-02-03      4           JANUARY 2015

I tried something like this, but it's not working:

USE MYDATABASE

SELECT * 
FROM RESERVATIONLIST
WHERE [MTH] IN ('JANUARY 2015', 'FEBRUARY 2015')    
GROUP BY [ID] 
HAVING COUNT ([MTH]) > 1
2

There are 2 answers

1
Deep On BEST ANSWER

Following query will return one row per ID :

SELECT * FROM 
(
    SELECT *,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY (SELECT NULL)) rn FROM RESERVATIONLIST
    WHERE [MTH] IN ('JANUARY 2015','FEBRUARY 2015')
) T
WHERE rn = 1

Note : this will return a random row from multiple rows having same ID. IF you want to select some specific row then you have to define it in order by. For e.g. :

SELECT * FROM 
(
    SELECT *,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY DOA DESC) rn FROM RESERVATIONLIST
    WHERE [MTH] IN ('JANUARY 2015','FEBRUARY 2015')
) T
WHERE rn = 1

definitely, it will return the row having max(DOA).

2
Martijn van der Jagt On

You are trying to do a GROUP BY statement which IMHO is the right way to go. You should formulate all columns that are a constant, and roll-up the others. Depending on the value of DOD and DOA I can see two solutions:

SELECT ID,NAME,DOA,DOD,SUM([Nights Spent]) as Nights,
   min(MTH) as firstRes, max(MTH) as lastRes
FROM RESERVATIONLIST
GROUP BY ID,NAME,DOA,DOD

OR

SELECT ID,NAME,min(DOA) as firstDOA,max(DOD) as lastDOD,SUM([Nights Spent]) as Nights,
   min(MTH) as firstRes, max(MTH) as lastRes
FROM RESERVATIONLIST
GROUP BY ID,NAME