connecting pyarrow with libhdfs3

Question

connecting pyarrow with libhdfs3

5k views Asked by Jay At 20 November 2017 at 21:26

I'm trying to connect to a hadoop cluster via pyarrows' HdfsClient / hdfs.connect().

I noticed pyarrows' have_libhdfs3() function, which returns False.

How does one go about getting the required hdfs support for pyarrow? I understand there's a conda command for libhdfs3, but I pretty much need to make it work through some "vanilla" way that doesn't involve things like conda.

If it's of importance, the files I'm interested in reading are parquet files.

EDIT:

The creators of hdfs3 library have made a repo that allows installing libhdfs3:

http://hdfs3.readthedocs.io/en/latest/install.html

Original Q&A

There are 2 answers

Wes McKinney On 20 November 2017 at 21:44

I don't know of a way to get libhdfs3 except through conda-forge, or building from source. You will need to conda install libhdfs3=2.2.31 since there was a breaking API change that made libhdfs3 have a different ABI from libhdfs that we have not addressed in Arrow yet. See https://issues.apache.org/jira/browse/ARROW-1445 (patches welcome)

**Jay** · Accepted Answer · 2017-11-21T16:30:33+00:00

On ubuntu this worked for me -

echo "deb https://dl.bintray.com/wangzw/deb trusty contrib" | sudo tee /etc/apt/sources.list.d/bintray-wangzw-deb.list
sudo apt-get install -y apt-transport-https
sudo apt-get update
sudo apt-get install libhdfs3 libhdfs3-dev

It should work on other Linux distros as well using the appropriate installer. Taken from:

http://hdfs3.readthedocs.io/en/latest/install.html

TechQA.

connecting pyarrow with libhdfs3

There are 2 answers

Related Questions in HDFS

Related Questions in PARQUET

Related Questions in PYARROW

Related Questions in LIBHDFS

Popular Questions

Trending Questions