dbt macro - using *args/**kwargs

3.2k views Asked by At

Occasionally, our tables need to append multiple columns together to make a unique key. This can be done by doing something like:

select
    *,
    col1 || "_" || col2 as unique_key
from 
    my_table

This works but lends to a lack of uniformity amongst multiple analysts.

I would like to utilize pythons *args (i think jinja2 uses varargs) feature to make a macro that can take an arbitrary amount of arguments and create a unique key between all of them.

Ideal outcome:

select
    *,
    unique_key(col1, col1, ..., colN)
from 
    my_table
2

There are 2 answers

1
dylanbaker On BEST ANSWER

There is currently a macro in the dbt_utils package that does something similar, called surrogate key. It used to only use varargs and now also allows a list.

For the varargs portion, it does the following:

{%- for field in varargs %}
{%- set _ = field_list_xf.append(field) -%}
{%- endfor -%}

You can then join() or iterate through that list to do whatever you'd like. In the case of the macro, it does the following:

{%- for field in field_list_xf -%}

    {%- set _ = fields.append(
        "coalesce(cast(" ~ field ~ " as " ~ dbt_utils.type_string() ~ "), '')"
    ) -%}

    {%- if not loop.last %}
        {%- set _ = fields.append("'-'") -%}
    {%- endif -%}

{%- endfor -%}
0
Bilbottom On

To answer the "kwargs" part of your question (but for a different example), I'll modify an answer that I posted to a related question on args and kwargs in Jinja

The kwargs object will be a dictionary so if you want to use it you can iterate over it by accessing the items() method, like you can in Python. I have this in a generic import macro to streamline the top-level "import" CTEs:

{% macro import() %}
    {%- for key, value in kwargs.items() %}

    {{ key }} AS (
        SELECT *
        FROM {{ value }}
    ){%- if not loop.last -%},{%- endif -%}

    {% endfor %}
{% endmacro %}

This can be used in the following way:

WITH
    {{ import(
        payments=source("stripe", "payments"),
        customers=ref("stg_jaffle_shop__customers"),
        orders=ref("fct_orders"),
        bad_cte="jaffle_shop.pasties"
    ) }}

SELECT ...

This will compile into something like the following (depends on the model configs):

WITH

    payments AS (
        SELECT *
        FROM stripe.payments
    ),

    customers AS (
        SELECT *
        FROM jaffle_shop.customers
    ),

    orders AS (
        SELECT *
        FROM jaffle_shop.orders
    ),

    bad_cte AS (
        SELECT *
        FROM jaffle_shop.pasties
    )

SELECT ...

Note that the bad_cte argument is there to prove that this macro can take literal references in addition to source and model references, but that's generally a bad idea since a literal reference won't show up in dynamic content like the lineage diagram

Additionally, the advantage of setting up the imports is that it can be augmented to do things like use the INFORMATION_SCHEMA (or equivalent) to expand the * so that compiled scripts list all of their columns, which is useful for debugging and column lineage