Does the R arrow package have anything like the random access capability of the fst package?

265 views Asked by At

Our team is looking to integrate more of our R and python work. One part of this effort has been trying to move from fst files (using the package fst), which as far as I know cannot be read in python without interfacing with R (Is it possible to import .fst file in python) and instead using feather files (using the arrow package) that can be read natively by python.

The thing I'm running into is that we frequently use the random access functionality from fst (http://www.fstpackage.org/#random-access). For example, we may have a table in an fst file with 100 million rows, and 40 columns, 4gb. The table is sorted by a column MyDate (which contains Dates). With fst, I can read in just the MktDate column (which is quick and doesn't take much memory), identify the rows I need for some date range, and read in just that portion of the fst file. Is there any way to do that with feather? I've thought about using a file system such that a big file with say 5000 dates were instead stored as 5000 dated files, but I'd prefer to stick with just one file if possible.

0

There are 0 answers