retro-actively add partitions to parquet files?

Question

retro-actively add partitions to parquet files?

82 views Asked by lollerskates At 27 October 2023 at 20:18

i have a spark job that uses Apache Hudi to write parquet into our AwS S3 data lake. I have a pretty decent sized dataset (about ~20M rows and growing) that i would like to add a new partition to. Is this possible to do with my existing dataset? Or do i need to restart my spark job to re-create all the parquet files with the new partition configuration?

I am on spark 3.3.2 and hudi 0.13.1

Original Q&A

There are 1 answers

**parisni** · Answer 1 · 2023-10-28T17:44:12+00:00

As for curent hudi version <= 0.14, yes you have to rewrite the whole table with the new partition scheme.

The main blocker is that parquet files contains the partition path in the hudi internal columns. So you could manually modify some files (such as hoodie.properties, recreate from scratch the metadata table and so on) but at the end of the day you need to also rewrite the parquet to overwrite that column.

Otherwise you will end up with no support for deletion and maybe other complications

TechQA.

retro-actively add partitions to parquet files?

There are 1 answers

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Related Questions in PARQUET

Related Questions in APACHE-HUDI

Popular Questions

Popular Tags

Trending Questions