How to update nested object in delta-rs

216 views Asked by At

Given the below code, I'm able to update root-level fields as e.g. "age". But how to update fields inside my_struct?

The attempt with struct("a": "1", "b": "2") gives

DeltaTable error: This feature is not implemented: Unsupported SQL json operator Colon

I also tried named_struct, it gives

Generic DeltaTable error: Error during planning: Invalid function 'named_struct'.

I've been searching a couple of hours for docs/examples, but without luck so far.

import pandas as pd
from deltalake import write_deltalake, DeltaTable

df = pd.DataFrame.from_records([
     {"name": "Alice", "age": 25, "gender": "Female", "my_struct": {"a": 1, "b": 2}},
     {"name": "Bob", "age": 30, "gender": "Male", "my_struct": {"a": 3, "b": 4}}
])

write_deltalake('./db/my_table', df)

dt = DeltaTable('./db/my_table')
dt.update({"age": "42"})  # this is working
dt.update({"my_struct": 'struct("a": "1", "b": "2")'})  # not working
print(dt.to_pandas())

My pyproject.toml

[tool.poetry]
name = "delta-test"

[tool.poetry.dependencies]
python = "^3.10"
deltalake = "^0.12.0"
pandas = "1.5"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
1

There are 1 answers

0
Maciej Skorski On

What is wrong is that the update accepts SQL instructions rather than Python-like datatypes. Hence the error on the syntax 'struct("a": "1", "b": "2")' which is not a proper SQL call.

In anticipation of the narrowed question "what a SQL-compatible struct looks like", let's emphasize that this depends on the SQL dialect used and may not be that clear. In Databricks for example we could use named_struct('a', 1, 'b', 2). Here we deal with Python bindings for delta-rs, and from the older version of the documentation

DeltaTable.update() predicates and updates are all in string format. The predicates and expressions, are parsed into Apache Datafusion expressions.

it appears that we may have to align to DataFusion. In turn, according to their docs, the syntax struct(1,2) should work. The provided example recognizes this syntax (progress), but still complains about data types mismatch. I debuged and escalated the problem to the maintainers and we need to wait for their answer because it's up to them to document their API.