PyArrow OSError: [WinError 193] %1 is not a valid win32 application

430 views Asked by At

My OS is Windows 10 64 bit and I use Anaconda 3.8 64 bit. I try to develop Hadoop File System 3.3 client with PyArrow module. Installation of PyArrow with conda on windows 10 is successful.

> conda install -c conda-forge pyarrow

But connection of hdfs 3.3 with pyarrow throws errors like below,

import pyarrow as pa
fs = pa.hdfs.connect(host='localhost', port=9000)

The errors are

Traceback (most recent call last):
  File "C:\eclipse-workspace\PythonFredProj\com\aaa\fred\hdfs3-test.py", line 14, in <module>
    fs = pa.hdfs.connect(host='localhost', port=9000)
  File "C:\Python-3.8.3-x64\lib\site-packages\pyarrow\hdfs.py", line 208, in connect
    fs = HadoopFileSystem(host=host, port=port, user=user,
  File "C:\Python-3.8.3-x64\lib\site-packages\pyarrow\hdfs.py", line 38, in __init__
    _maybe_set_hadoop_classpath()
  File "C:\Python-3.8.3-x64\lib\site-packages\pyarrow\hdfs.py", line 136, in _maybe_set_hadoop_classpath
    classpath = _hadoop_classpath_glob(hadoop_bin)
  File "C:\Python-3.8.3-x64\lib\site-packages\pyarrow\hdfs.py", line 163, in _hadoop_classpath_glob
    return subprocess.check_output(hadoop_classpath_args)
  File "C:\Python-3.8.3-x64\lib\subprocess.py", line 411, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "C:\Python-3.8.3-x64\lib\subprocess.py", line 489, in run
    with Popen(*popenargs, **kwargs) as process:
  File "C:\Python-3.8.3-x64\lib\subprocess.py", line 854, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Python-3.8.3-x64\lib\subprocess.py", line 1307, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
OSError: [WinError 193] %1 is not a valid win32 application

I install the Visual C++ 2015 on Windows 10. But the same errors are still shown.

1

There are 1 answers

1
Joseph Hwang On

This is my solution.

  1. Before starting the pyarrow, Hadoop 3 has to be installed on your windows 10 64 bit. and the installation path has to be set on Path

  2. install pyarrow 3.0 (version is important. have to be 3.0)

    pip install pyarrow==3.0

  3. create PyDev module on eclipse PyDev perspective. The sample codes are like below

    from pyarrow import fs

    hadoop = fs.HadoopFileSystem("localhost", port=9000) print(hadoop.get_file_info('/'))

  4. choose your created pydev module and click the [Properties (Alt + Enter)]

  5. Click the [Run/Debug Settings]. Choose the the pydev module and [Edit] button. enter image description here

  6. In [Edit Configuration] window, select the [Environment] tab enter image description here

  7. Click [Add] button

  8. You have to make 2 Environment Variables. "CLASSPATH" and "LD_LIBRARY_PATH"

  1. CLASSPATH : In command prompt, execute the following command.
hdfs classpath --glob

copy the returned values and paste them into Value text field (The retured values are long string value. but copy them all)

enter image description here

  1. LD_LIBRARY_PATH : Insert the path of libhdfs.so file on hadoop 3, In my case "C:\hadoop-3.3.0\lib\native" into Value text field.

enter image description here

enter image description here

  1. Ok! the pyarrow 3.0 configuration is set. You can connect the hadoop 3.0 on windows 10 eclipse PyDev.