Segmentation Fault 11 when listing HDFS files

307 views Asked by At

Apologies, because I don't know enough to ask this question correctly; all I know is that I'm getting a Segmentation Fault: 11 error whenever I try to list multiple files stored on HDFS using PyArrow with the libhdfs3 driver in Python3:

Python 3.7.1 | packaged by conda-forge | (default, Nov 13 2018, 10:30:07) 
[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin

Here is the code I'm running:

import pyarrow as pa
fs = pa.hdfs.connect('localhost', 8020, driver='libhdfs3')

This connects to HDFS fine, so I then run:

>>> fs.ls("/user/dan/", detail=False)
['/user/dan/testing'] # this directory has 2 files in it

>>> fs.ls("/user/dan/testing", detail=False)
Segmentation fault: 11

Interestingly, if I delete one of the files ...

>>> fs.ls("/user/dan/testing", detail=False)
['/user/dan/testing/[email protected]']

... it works and does not segfault.

Since I don't even know which part of my environment might be causing this (Python? Pyarrow? libhdfs3?), I'm not sure what to even search for as far as troubleshooting.

Any thoughts or recommendations are greatly appreciated!

0

There are 0 answers