I consider a row duplicate when the combination of hash_key and load_date is repeating. I can find such rows using this CTE:
with w_duplicated_rows as (
select hash_key, load_date, count(*),
--row_number() over (partition by hash_key, load_date order by load_date desc) as row_num
from my_table
group by hash_key, load_date having count(*) > 1
)
This gives me the combinations which are being repeated in the table.
Now, I want to delete duplicates from the original my_table. I have another column called STATUS. While deleting duplicates, I want to delete the row after filtering on ORDER BY STATUS DESC. Means I want to keep the top row and remove the second one (where status is lower).
How can I achieve this?
I can suggest using
ROW_NUMBER()window function to assign a row number to each row within the duplicate groups, ordered bySTATUS DESC. Then, you can use a common table expression(CTE)to identify the rows with row numbers greater than 1 and delete them from the original table.Here's an example query: