SQL data versioning in DuckDB

362 views Asked by At

I'm using MonetDB in a project and currently evaluating moving to DuckDB. As part of that I'm also reevaluating how we do versioning of data and if there's a better way to do it in DuckDB.

We manage data versions by having tables structured like this:

create table data(
  row int,
  key varchar(50),
  value varchar(50),
  version int
);

create table data_version(
  row int,
  next_version int
);

and then querying like this (to show a surface of all data up to and including version 2):

select d.key, d.value from data d
    left join data_version dv on dv.row = d.row
    where d.version <= 2 and (dv.next_version > 2 or dv.next_version is null);
  

Mininal working example here

This has the advantage of being append only (no table updates, just inserts) and seems to be quite performant. Bulk loading can be tricky because you have to keep track of what was already written in order to update the data_version table, but it's not too bad.

DuckDB has a lot of great functionality above and beyond standard SQL (like window funtions) and I'm wondering does this mean there's a better way to do versioning of data? I'm hoping someone more familiar with DuckDB might know. (Maybe there's just a better way to do versioning anyway!)

(Note, the example above isn't really showing off why we need a column oriented database but in that data table there will be lots of other columns we perform grouping style queries on, with the versioning clause)

0

There are 0 answers