PyArrow >= 0.8.0 must be installed; however, it was not found

Question

PyArrow >= 0.8.0 must be installed; however, it was not found

840 views Asked by vishal At 10 October 2020 at 02:29

I am on the Cloudera platform, I am trying to use pandas UDF in pyspark.I am getting below error. PyArrow >= 0.8.0 must be installed; however, it was not found.

Installing pyarrow 0.8.0 on the platform will take time.

Is there any workaround to use pandas udf without installing pyarrow? I can install on my personal anaconda environment, is it possible to export conda and use it in pyspark?

Original Q&A

There are 1 answers

**E.ZY.** · Answer 1 · 2020-10-10T08:21:27+00:00

I can install on my personal anaconda environment, is it possible to export conda and use it in pyspark? No you cant simply install in your machine and use it, as pyspark is distributed.

But you can pack your venv and ship to your pyspark worker without install custom package like pyarrow on every machine of your platform.
To use virtualenv, simply follow venv-pack package's instruction. https://jcristharif.com/venv-pack/spark.html

TechQA.

PyArrow >= 0.8.0 must be installed; however, it was not found

There are 1 answers

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Related Questions in APACHE-SPARK-SQL

Related Questions in ANACONDA

Related Questions in PYARROW

Popular Questions

Popular Tags

Trending Questions