Convertion of a pandas dataframe to a pyspark dataframe with pyarrow optimization does not work

374 views Asked by At

When I try to convert a pandas dataframe to a pyspark one like this

df = spark.createDataFrame(pd.DataFrame({'a': [1,2], 'b': [4,5]}))

I get the following error:

AttributeError: 'ChunkedStream' object has no attribute 'closed'

I also set ARROW_PRE_0_15_IPC_FORMAT=1 as recommended by the spark documentation for pyarrow>=0.15.0, but it didn't help.

pyspark version: 2.4.0
pyarrow version: 0.13.0 (the error also happes with pyarrow versions 0.16.0 and 1.0.1)
pandas version: 1.0.3
java version: 1.8.0_201
python version: 3.7.4

P.S.: if to set 'spark.sql.execution.arrow.fallback.enabled' to 'true' then the convertion works fine but without pyarrow optimization. Unfortunatelly since I have a quite big pandas dataframe I need pyarrow optimization working.

0

There are 0 answers