i'm using pandas to convert dataframes to .parquet files using this command:
df.to_parquet(file_name, engine='pyarrow', compression='gzip')
I need to use zstandard as compression algorithm, but the function above accepts only gzip, snappy, and brotli. I tried Is there a way to include zstd in this function? If not, how can i do that with other packages? I tried with zstandard, but it seems to accept only bytes-like objects.
I usually use zstandard as my compression algorithm for my dataframes.
This is the code I use (a bit simplified) to write those parquet files:
And to read these parquet files:
For more details see these sites for more information:
Finally a shameless plug for a blog post I wrote. It is about the speed vs space balance of zstandard and snappy compression in parquet files using pyarrow. It is relevent to your question and includes some more "real world" code examples of reading and writing parquet files in zstandard. I will actually be writing a follow up soon too. if you're interested let me know.