Using DataFrameWriterV2.overwrite() method to overwrite rows in a Iceberg Table

449 views Asked by At

I've an Iceberg Table in AWS Glue, using pyspark and I need, for every write of my DataFrame, to overwrite only existing rows in the table. I've discovered the DataFrameWriterV2.overwrite() method and I'm trying to use it as follows:

df.sortWithinPartitions(F.to_date("ts"), "account_id").repartitionByRange(F.to_date("ts")).writeTo( my_table ).overwrite(F.col("id"))

because my id column tells if a row is duplicated, and that can be overwritten in the table.

But, I'm always getting a TypeError: Column is not iterable.

I couldn't get it to work at all. Any idea about how to overwrite a row in the table?

I'm expecting to see in the destination table all rows overwritten when the id value of the rows in my DataFrame already existed in the table. That to avoid duplicated rows in my destination table. Thank you.

0

There are 0 answers