I was very surprised in limitations and changes when using DLTs with Unity Catalog. Among this list, one of scariest thing (and for me dealbreaker) is architectural decision to tightly connect DLT with data it produces. Previously when I was using DLTs with Hive Metastore I had freedom to change and delete DLTs without being scared that mu data will be lost (I used unmanaged tables - so setting location explicitly in DLT). Now that possibility doesn't exist. Long story short, if I remove DLT all data will be lost. If I remove some particular task data from its table will be lost. These are action which are done from time to time. I maybe want to refactor DLT. I maybe want to stop doing updates of some table, but I want to keep data. I can accidentally make mistake and data will be lost.
Questions:
- Is there a way to keep data if DLT is removed?
- Why is this decision made and is there a chance that it will be changed?
If not I don't see any other option but to use DLT only from Hive Metastore and than ingest data downstream to UC.
Currently encountering the same issue We came across a note in the Databricks documentation suggesting that it is possible to recover data after a pipeline has been deleted. However, the documentation does not provide detailed instructions on how to practically achieve this recovery.
https://docs.databricks.com/en/delta-live-tables/unity-catalog.html#:~:text=The%20data%20can%20be%20recovered,confirm%20deletion%20of%20a%20pipeline.