PyArrow OSError: [WinError 193] %1 is not a valid win32 application

Question

PyArrow OSError: [WinError 193] %1 is not a valid win32 application

427 views Asked by Joseph Hwang At 21 October 2020 at 01:58

My OS is Windows 10 64 bit and I use Anaconda 3.8 64 bit. I try to develop Hadoop File System 3.3 client with PyArrow module. Installation of PyArrow with conda on windows 10 is successful.

> conda install -c conda-forge pyarrow

But connection of hdfs 3.3 with pyarrow throws errors like below,

import pyarrow as pa
fs = pa.hdfs.connect(host='localhost', port=9000)

The errors are

Traceback (most recent call last):
  File "C:\eclipse-workspace\PythonFredProj\com\aaa\fred\hdfs3-test.py", line 14, in <module>
    fs = pa.hdfs.connect(host='localhost', port=9000)
  File "C:\Python-3.8.3-x64\lib\site-packages\pyarrow\hdfs.py", line 208, in connect
    fs = HadoopFileSystem(host=host, port=port, user=user,
  File "C:\Python-3.8.3-x64\lib\site-packages\pyarrow\hdfs.py", line 38, in __init__
    _maybe_set_hadoop_classpath()
  File "C:\Python-3.8.3-x64\lib\site-packages\pyarrow\hdfs.py", line 136, in _maybe_set_hadoop_classpath
    classpath = _hadoop_classpath_glob(hadoop_bin)
  File "C:\Python-3.8.3-x64\lib\site-packages\pyarrow\hdfs.py", line 163, in _hadoop_classpath_glob
    return subprocess.check_output(hadoop_classpath_args)
  File "C:\Python-3.8.3-x64\lib\subprocess.py", line 411, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "C:\Python-3.8.3-x64\lib\subprocess.py", line 489, in run
    with Popen(*popenargs, **kwargs) as process:
  File "C:\Python-3.8.3-x64\lib\subprocess.py", line 854, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "C:\Python-3.8.3-x64\lib\subprocess.py", line 1307, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
OSError: [WinError 193] %1 is not a valid win32 application

I install the Visual C++ 2015 on Windows 10. But the same errors are still shown.

Original Q&A

There are 1 answers

**Joseph Hwang** · Answer 1 · 2021-03-05T23:01:24+00:00

This is my solution.

Before starting the pyarrow, Hadoop 3 has to be installed on your windows 10 64 bit. and the installation path has to be set on Path
install pyarrow 3.0 (version is important. have to be 3.0)

pip install pyarrow==3.0
create PyDev module on eclipse PyDev perspective. The sample codes are like below

from pyarrow import fs

hadoop = fs.HadoopFileSystem("localhost", port=9000) print(hadoop.get_file_info('/'))
choose your created pydev module and click the [Properties (Alt + Enter)]
Click the [Run/Debug Settings]. Choose the the pydev module and [Edit] button.
In [Edit Configuration] window, select the [Environment] tab
Click [Add] button
You have to make 2 Environment Variables. "CLASSPATH" and "LD_LIBRARY_PATH"