I have the following in snowflake:
create or replace table json_tmp as select column1 as id, parse_json(column2) as c
from VALUES (1,
'{"id": "0x1",
"custom_vars": [
{ "key": "a", "value": "foo" },
{ "key": "b", "value": "bar" }
] }') v;
Based on the FLATTEN
docs, I hoped to turn these into a table looking like this:
+-------+---------+-----+-----+
| db_id | json_id | a | b |
+-------+---------+-----+-----+
+-------+---------+-----+-----+
| 1 | 0x1 | foo | bar |
+-------+---------+-----+-----+
Here is the query I tried; it resulted in a SQL compilation error: "Object 'CUSTOM_VARS' does not exist."
select json_tmp.id as dbid,
f.value:id as json_id,
a.v,
b.v
from json_tmp,
lateral flatten(input => json_tmp.c) as f,
lateral flatten(input => f.value:custom_vars) as custom_vars,
lateral (select value:value as v from custom_vars where value:key = 'a') as a,
lateral (select value:value as v from custom_vars where value:key = 'b') as b;
What exactly is the error here? Is there a better way to do this transformation?
Note - your solution doesn't actually perform any joins - flatten is a "streaming" operation, it "explodes" the input, and then selects the rows it wants. If you only have 2 attributes in the data, it should be reasonably fast. However, if not, it can lead to an unnecessary data explosion (e.g. if you have 1000s of attributes).
The fastest solution depends on how your data is structured exactly, and what you can assume about the input. For example, if you know that 'a' and 'b' are always in that order, you can obviously use
If you know that
custom_vars
is always 2 elements, but the order is not known, you could do e.g.If the size of custom_vars is unknown, you could create a JavaScript function like
extract_key(custom_vars, key)
that would iterate overcustom_vars
and returnvalue
for the foundkey
(or e.g.null
or<empty_string>
if not found).Hope this helps. If not, please provide more details about your problem (data, etc).