The Use Case:
- Store versions of Large Datasets (CSV/Snowflake Tables) and query across versions
DeltaLake says that unless we run vacuum command we retain historical information in a DeltaTable. And Log files are deleted every 30 days. Here
And Additional Documentation states that we need both the log files and DataFiles to time travel. here
Does this imply that we can only time travel 30 days?
But isn't Delta
a file format? How would it automatically delete it's logs?
If yes, what are the other open source versions that can solve querying across dataset versions.?
Just set the data and log retention settings to a very long period.