I am developing a process which will write to different iceberg table with different partition. Before, we write data into iceberg by spark, we need re-partition and sort with partition firstly. Now, I can't. I found that set iceberg table property write.distribution-mode = 'hash' will help, but it did not work when i tried.
Anyone knows why the property does not work, or Is there any way to make spark write to iceberg re-partition automatically?
write.distribution-mode, - property that controls the distribution mode for parallel writes.
https://iceberg.apache.org/docs/latest/configuration/
df.repartition - repartitioning.
https://sparkbyexamples.com/pyspark/pyspark-repartition-usage/
So, I’m hoping that you can use this example.
In Python