I'm trying to read a range of data (say row 1000 to 5000) from a parquet file. I've tried pandas with fastparquet engine and even pyarraw but can't seem to find any option to do so.
Is there any way to achieve this?
I'm trying to read a range of data (say row 1000 to 5000) from a parquet file. I've tried pandas with fastparquet engine and even pyarraw but can't seem to find any option to do so.
Is there any way to achieve this?
I don't think the current pyarrow version (2.0) supports it.
The closest you can get with your file slicing is by using
filters
argument ofread_table
.If your dataset has a column
foo
based on which you can get your required rows, use something like this:If you happen to have a column
id
corresponding to the row index you can use