I have a Delta Lake table in Azure. I'm using Databricks. When we add new entries we use merge into to prevent duplicates from getting into the table. However, duplicates did get into the table. I'm not sure how it happened. Maybe the merge into conditions weren't setup properly.
However it happened the duplicates are there. Is there any way to detect and remove the duplicates from the table? All the documentation I've found shows how to deduplicate the dataset before merging. Nothing for once the duplicates are already there. How can I remove the duplicates?
Thanks
If the duplicate exists in the target table, your only options are: