Given the below code, I'm able to update root-level fields as e.g. "age". But how to update fields inside my_struct
?
The attempt with struct("a": "1", "b": "2")
gives
DeltaTable error: This feature is not implemented: Unsupported SQL json operator Colon
I also tried named_struct
, it gives
Generic DeltaTable error: Error during planning: Invalid function 'named_struct'.
I've been searching a couple of hours for docs/examples, but without luck so far.
import pandas as pd
from deltalake import write_deltalake, DeltaTable
df = pd.DataFrame.from_records([
{"name": "Alice", "age": 25, "gender": "Female", "my_struct": {"a": 1, "b": 2}},
{"name": "Bob", "age": 30, "gender": "Male", "my_struct": {"a": 3, "b": 4}}
])
write_deltalake('./db/my_table', df)
dt = DeltaTable('./db/my_table')
dt.update({"age": "42"}) # this is working
dt.update({"my_struct": 'struct("a": "1", "b": "2")'}) # not working
print(dt.to_pandas())
My pyproject.toml
[tool.poetry]
name = "delta-test"
[tool.poetry.dependencies]
python = "^3.10"
deltalake = "^0.12.0"
pandas = "1.5"
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
What is wrong is that the update accepts SQL instructions rather than Python-like datatypes. Hence the error on the syntax
'struct("a": "1", "b": "2")'
which is not a proper SQL call.In anticipation of the narrowed question "what a SQL-compatible struct looks like", let's emphasize that this depends on the SQL dialect used and may not be that clear. In Databricks for example we could use
named_struct('a', 1, 'b', 2)
. Here we deal with Python bindings for delta-rs, and from the older version of the documentationit appears that we may have to align to DataFusion. In turn, according to their docs, the syntax
struct(1,2)
should work. The provided example recognizes this syntax (progress), but still complains about data types mismatch. I debuged and escalated the problem to the maintainers and we need to wait for their answer because it's up to them to document their API.