Failed to get broadcast_1_piece0 of broadcast_1 in pyspark application

Question

Failed to get broadcast_1_piece0 of broadcast_1 in pyspark application

868 views Asked by user2122292 At 20 December 2016 at 07:08

I was building an application on Apache Spark 2.00 with Python 3.4 and trying to load some CSV files from HDFS (Hadoop 2.7) and process some KPI out of those CSV data.

I use to face "Failed to get broadcast_1_piece0 of broadcast_1" error randomly in my application and it stopped.

After searching a lot google and stakeoverflow, I found only how to get rid of it by deleting spark app created files manually from /tmp directory. It happens generally when an application is running for long and it's not responding properly but related files are in /tmp directory.

Though I don't declare any variable for broadcast but may be spark is doing at its own.

In my case, the error occurs when it is trying to load csv from hdfs.

I have taken low level logs for my application and attached herewith for support and suggestions/best practice so that I can resolve the problem.

Sample (details are Attached here):

Traceback (most recent call last): File "/home/hadoop/development/kpiengine.py", line 258, in df_ho_raw = sqlContext.read.format('com.databricks.spark.csv').options(header='true').load(HDFS_BASE_URL + HDFS_WORK_DIR + filename) File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 147, in load File "/usr/local/spark/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", line 933, in call File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco File "/usr/local/spark/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 312, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o44.load. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 172.26.7.192): java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_1_piece0 of broadcast_1

Original Q&A

There are 1 answers

**李建飞** · Answer 1 · 2017-09-15T03:25:49+00:00

李建飞 On 15 September 2017 at 03:25

You should to extends Serializable for your class

Your code Framework error, you can test it

$SPARK_HOME/examples/src/main/scala/org/apache/spark/examples/

If it's ok, you should check your code.

TechQA.

Failed to get broadcast_1_piece0 of broadcast_1 in pyspark application

There are 1 answers

Related Questions in PYTHON-3.X

Related Questions in APACHE-SPARK

Related Questions in PYSPARK

Related Questions in HADOOP2

Related Questions in SPARK-CSV

Popular Questions

Popular Tags

Trending Questions