Using DataFrameWriterV2.overwrite() method to overwrite rows in a Iceberg Table

443 views Asked by Antonio Pintus At 20 October 2023 at 08:17

I've an Iceberg Table in AWS Glue, using pyspark and I need, for every write of my DataFrame, to overwrite only existing rows in the table. I've discovered the DataFrameWriterV2.overwrite() method and I'm trying to use it as follows:

df.sortWithinPartitions(F.to_date("ts"), "account_id").repartitionByRange(F.to_date("ts")).writeTo( my_table ).overwrite(F.col("id"))

because my id column tells if a row is duplicated, and that can be overwritten in the table.

But, I'm always getting a TypeError: Column is not iterable.

I couldn't get it to work at all. Any idea about how to overwrite a row in the table?

I'm expecting to see in the destination table all rows overwritten when the id value of the rows in my DataFrame already existed in the table. That to avoid duplicated rows in my destination table. Thank you.

Original Q&A

TechQA.

Using DataFrameWriterV2.overwrite() method to overwrite rows in a Iceberg Table

There are 0 answers

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Related Questions in AWS-GLUE

Related Questions in APACHE-ICEBERG

Popular Questions

Popular Tags

Trending Questions