How to compress parquet file with zstandard using pandas

Question

How to compress parquet file with zstandard using pandas

4.9k views Asked by alcor At 28 October 2019 at 16:54

i'm using pandas to convert dataframes to .parquet files using this command:

df.to_parquet(file_name, engine='pyarrow', compression='gzip')

I need to use zstandard as compression algorithm, but the function above accepts only gzip, snappy, and brotli. I tried Is there a way to include zstd in this function? If not, how can i do that with other packages? I tried with zstandard, but it seems to accept only bytes-like objects.

Original Q&A

There are 4 answers

**Levi Sands** · Answer 1 · 2020-03-26T03:25:55+00:00

I usually use zstandard as my compression algorithm for my dataframes.

This is the code I use (a bit simplified) to write those parquet files:

import pandas as pd
import pyarrow.parquet as pq
import pyarrow as pa

parquetFilename = "test.parquet"

df = pd.DataFrame(
    {
        "num_legs": [2, 4, 8, 0],
        "num_wings": [2, 0, 0, 0],
        "num_specimen_seen": [10, 2, 1, 8],
    },
    index=["falcon", "dog", "spider", "fish"],
)

df = pa.Table.from_pandas(df)
pq.write_table(df, parquetFilename, compression="zstd")

And to read these parquet files:

import pandas as pd
import pyarrow.parquet as pq
import pyarrow as pa

parquetFilename = "test.parquet"
df = pq.read_table(parquetFilename)
df = df.to_pandas()

For more details see these sites for more information:

Finally a shameless plug for a blog post I wrote. It is about the speed vs space balance of zstandard and snappy compression in parquet files using pyarrow. It is relevent to your question and includes some more "real world" code examples of reading and writing parquet files in zstandard. I will actually be writing a follow up soon too. if you're interested let me know.

**John** · Answer 2 · 2023-02-22T02:00:19+00:00

You can actually just use

df.to_parquet(file_name, engine='pyarrow', compression='zstd')

Note: Only pyarrow supports Zstandard compression, fastparquet does not.

Reading is even easier, since you don't have to name the compression algorithm:

df = pd.read_parquet(file_name)

Up to now (Pandas 1.5.3) it was documented only in the backend since Pandas 1.4.0. The missing documentation in the interface has been fixed in the current development version.

**Istvan** · Answer 3 · 2020-02-04T18:51:53+00:00

It seems it is not supported yet:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_parquet.html

compression{‘snappy’, ‘gzip’, ‘brotli’, None}, default ‘snappy’ Name of the compression to use. Use None for no compression.

**A. West** · Answer 4 · 2024-01-22T12:21:31+00:00

Dependencies: `%pip install pandas[parquet, compression]>=1.4`

Code: `df.to_parquet(filepath, compression='zstd')`

Documentation

Installed by "parquet": pyarrow is the default parquet/feather engine, fastarrow also exists.
Installed by "compression": Zstandard is only mentioned from pandas>=1.4 and in to_parquet from pandas>=2.1

TechQA.

How to compress parquet file with zstandard using pandas

There are 4 answers

Dependencies: `%pip install pandas[parquet, compression]>=1.4`

Code: `df.to_parquet(filepath, compression='zstd')`

Documentation

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in ZSTANDARD

Popular Questions

Popular Tags

Trending Questions

How to compress parquet file with zstandard using pandas

There are 4 answers

Dependencies: %pip install pandas[parquet, compression]>=1.4

Code: df.to_parquet(filepath, compression='zstd')

Documentation

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in DATAFRAME

Related Questions in ZSTANDARD

Popular Questions

Popular Tags

Trending Questions

Dependencies: `%pip install pandas[parquet, compression]>=1.4`

Code: `df.to_parquet(filepath, compression='zstd')`