I'm trying to read a range of data (say row 1000 to 5000) from a parquet file. I've tried pandas with fastparquet engine and even pyarraw but can't seem to find any option to do so.
Is there any way to achieve this?
I'm trying to read a range of data (say row 1000 to 5000) from a parquet file. I've tried pandas with fastparquet engine and even pyarraw but can't seem to find any option to do so.
Is there any way to achieve this?
I don't think the current pyarrow version (2.0) supports it.
The closest you can get with your file slicing is by using
filtersargument ofread_table.If your dataset has a column
foobased on which you can get your required rows, use something like this:If you happen to have a column
idcorresponding to the row index you can use