How to delete rows from delta historical files on databricks?

131 views Asked by At

I have some historic tables that need deletion on some rows because we no longer can have that data. I also need to delete this data from previous delta table versions because of audit purposes.

For what I've read the VACUUM command would be good for my use case, with a small retention period of 5 hours. I'm testing this and the history won't go away with VACUUM, neither the VACUUM operation gets logged on the table history.

Steps:

  1. Delete the rows from the table.
DELETE FROM delta.`path_to_table` WHERE code = 20
  1. Vacuum the delta table.
spark.conf.set("spark.databricks.delta.retentionDurationCheck.enabled", "false")

QUERY = f"""VACUUM delta.`path_to_table` RETAIN 5 HOURS"""

result = spark.sql(QUERY)

  1. Check history of delta table.

History of the table is the same without any VACUUM operation.

Any idea about this behaviour? Any suggestions of what I should try?

Thank you.

0

There are 0 answers