I'm trying to dynamically determine warnings and errors on freshness checks, specified in dbt sources.yml, based on the median and std dev of the "synced_at" column of the underlying source.
To accomplish this, I thought I might try to pass a macro in the freshness block of the source.yml file as so:
# sources.yml
...
tables:
- name: appointment_type
freshness:
error_after:
count: test_macro()
period: hour
...
Where:
{%- macro test_macro(this) -%}
{# /*
The idea is {{ this.table }} would parameterize a query,
going over the same column name for all sources, _fivetran_synced,
and spit out the calculated values I want. This makes me feel like
it needs to be a prehook, that somehow stores the value in a var,
and that is accessed in the source.yml, instead of calling it directly.
In this case a trivial integer is attempted to be returned, just as an example.
*/ #}
{{ return(24) }}
{%- endmacro -%}
However this results in a type error. Presumably the macro is not called at all. Wrapping it in jinja quotes also returns an error.
I am curious if passing dynamic values to freshness checks can currently be achieved in any way?
It isn't possible today to call macros from
.yml
files, for precisely this reason: dbt needs to be able to statically parse those files and validate internal objects (including resource properties like sourcefreshness
) before it runs any queries against the database.I think you could maybe hack this by overriding the
collect_freshness
macro to return, instead of simplymax(synced_at)
, a timestamp that is Z-score diffed fromcurrent_timestamp
, normalized based on all Fivetranmax(synced_at)
timestamps. It feels tricky but possible.At the same time, I'd gently push back on your larger goal here. We think of source freshness as something that should be prescriptive. You get to tell Fivetran how often you want it to sync data, and add
freshness
blocks to test those expectations. You can run ad hoc queries like the one you envision above to determine if those expectations are reasonable. Obviously, some tables are updated infrequently or unpredictably, but I find it's more useful to override or remove these tables' freshness expectations than to add significant complexity on their account.