Reading CSV or Parquet files from local fs is very easy, but it seems that arrow does not support reading files from a remote server given its ip. Is there a way to achieve this? e.g. read a subset columns of a Parquet file from a remote server (path is like "ip://path/to/remote/file"). Thanks.
Is there a way to read files using arrow from the remote server in c++?
743 views Asked by Raining. At
2
There are 2 answers
0
On
pyarrow.dataset.dataset() has a filesystem argument through which it supports many remote file systems.
See the Arrow documentation for file systems. An fsspec file system can also be passed in, of which there are very many.
For example, if your Parquet file is sitting on a web server, you could use the fsspec HTTP file system:
import pyarrow.dataset as ds
import fsspec.implementations.http
http = fsspec.implementations.http.HTTPFileSystem()
d = ds.dataset('http://localhost:8000/test.parquet', filesystem=http)
There is an open issue for this if you would like to contribute or follow development: https://issues.apache.org/jira/browse/ARROW-7594
(By 'remote server' I assume you mean over HTTP(s) or similar. If you're looking for a custom client-server protocol, check out Arrow Flight.)