How to keep data after I remove DLT or some DLT task in Unity Catalog

258 views Asked by At

I was very surprised in limitations and changes when using DLTs with Unity Catalog. Among this list, one of scariest thing (and for me dealbreaker) is architectural decision to tightly connect DLT with data it produces. Previously when I was using DLTs with Hive Metastore I had freedom to change and delete DLTs without being scared that mu data will be lost (I used unmanaged tables - so setting location explicitly in DLT). Now that possibility doesn't exist. Long story short, if I remove DLT all data will be lost. If I remove some particular task data from its table will be lost. These are action which are done from time to time. I maybe want to refactor DLT. I maybe want to stop doing updates of some table, but I want to keep data. I can accidentally make mistake and data will be lost.

Questions:

  1. Is there a way to keep data if DLT is removed?
  2. Why is this decision made and is there a chance that it will be changed?

If not I don't see any other option but to use DLT only from Hive Metastore and than ingest data downstream to UC.

1

There are 1 answers

2
user3503929 On

Currently encountering the same issue We came across a note in the Databricks documentation suggesting that it is possible to recover data after a pipeline has been deleted. However, the documentation does not provide detailed instructions on how to practically achieve this recovery.

When Delta Live Tables is configured to persist data to Unity Catalog, the lifecycle of the table is managed by the Delta Live Tables pipeline. Because the pipeline manages the table lifecycle and permissions: When a table is removed from the Delta Live Tables pipeline definition, the corresponding materialized view or streaming table entry is removed from Unity Catalog on the next pipeline update. The actual data is retained for a period of time so that it can be recovered if it was deleted by mistake. The data can be recovered by adding the materialized view or streaming table back into the pipeline definition.

https://docs.databricks.com/en/delta-live-tables/unity-catalog.html#:~:text=The%20data%20can%20be%20recovered,confirm%20deletion%20of%20a%20pipeline.