Delta Table read performance when using delta-rs Python API?

146 views Asked by At

I'm trying to read Delta Table using delta-rs library (Python).

The table has millions of records and we wanted to read it frequently using Rest API call(only specific records, based on request).

So, i was checking the delta-rs library. Since it has millions of records the read performance is not good..

Its reading the entire table and convert it as Pandas DF( before i can filter based on my request).

Is there a way to read only the records what i need instead of reading entire table then filter ( like column pruning , predicate pushdown etc)

Update: i followed this issue (https://github.com/delta-io/delta-rs/issues/631) and able to get good performance by converting DeltaTable to PyArrow Dataset and then using Duckdb to filter.

0

There are 0 answers