snowflake json lateral subquery

1.7k views Asked by At

I have the following in snowflake:

create or replace table json_tmp as select column1 as id, parse_json(column2) as c
    from VALUES (1,
                 '{"id": "0x1",
                   "custom_vars": [
                      { "key": "a", "value": "foo" },
                      { "key": "b", "value": "bar" }
                   ] }') v;

Based on the FLATTEN docs, I hoped to turn these into a table looking like this:

+-------+---------+-----+-----+
| db_id | json_id |  a  |  b  |
+-------+---------+-----+-----+
+-------+---------+-----+-----+
|   1   |   0x1   | foo | bar |
+-------+---------+-----+-----+

Here is the query I tried; it resulted in a SQL compilation error: "Object 'CUSTOM_VARS' does not exist."

select json_tmp.id as dbid,
    f.value:id as json_id,
    a.v,
    b.v
from json_tmp,
    lateral flatten(input => json_tmp.c) as f,
    lateral flatten(input => f.value:custom_vars) as custom_vars,
    lateral (select value:value as v from custom_vars where value:key = 'a') as a,
    lateral (select value:value as v from custom_vars where value:key = 'b') as b;

What exactly is the error here? Is there a better way to do this transformation?

2

There are 2 answers

3
Marcin Zukowski On BEST ANSWER

Note - your solution doesn't actually perform any joins - flatten is a "streaming" operation, it "explodes" the input, and then selects the rows it wants. If you only have 2 attributes in the data, it should be reasonably fast. However, if not, it can lead to an unnecessary data explosion (e.g. if you have 1000s of attributes).

The fastest solution depends on how your data is structured exactly, and what you can assume about the input. For example, if you know that 'a' and 'b' are always in that order, you can obviously use

select 
    id as db_id, 
    c:id, 
    c:custom_vars[0].value, 
    c:custom_vars[1].value 
from json_tmp;

If you know that custom_vars is always 2 elements, but the order is not known, you could do e.g.

select 
    id as db_id, 
    c:id, 
    iff(c:custom_vars[0].key = 'a', c:custom_vars[0].value, c:custom_vars[1].value), 
    iff(c:custom_vars[0].key = 'b', c:custom_vars[0].value, c:custom_vars[1].value) 
from json_tmp;

If the size of custom_vars is unknown, you could create a JavaScript function like extract_key(custom_vars, key) that would iterate over custom_vars and return value for the found key (or e.g. null or <empty_string> if not found).

Hope this helps. If not, please provide more details about your problem (data, etc).

0
jsharp On

Update Nov 2019

There seems to be a function that does this sort of thing:

select json_tmp.id as dbid,
    json_tmp.c:id as json_id,
    object_agg(custom_vars.value:key, custom_vars.value:value):a as a,
    object_agg(custom_vars.value:key, custom_vars.value:value):b as b
from
    json_tmp,
    lateral flatten(input => json_tmp.c, path => 'custom_vars') custom_vars
group by json_tmp.id

Original answer Sept 2017

The following query seems to work:

select json_tmp.id as dbid,
    json_tmp.c:id as json_id,
    a.value:value a,
    b.value:value b
from
    json_tmp,
    lateral flatten(input => json_tmp.c, path => 'custom_vars') a,
    lateral flatten(input => json_tmp.c, path => 'custom_vars') b
where a.value:key = 'a' and b.value:key = 'b'
;

I'd rather filter in a subquery rather than on the join, so I'm still interested in seeing other answers.