I was building an application on Apache Spark 2.00 with Python 3.4 and trying to load some CSV files from HDFS (Hadoop 2.7) and process some KPI out of those CSV data.
I use to face "Failed to get broadcast_1_piece0 of broadcast_1" error randomly in my application and it stopped.
After searching a lot google and stakeoverflow, I found only how to get rid of it by deleting spark app created files manually from /tmp directory. It happens generally when an application is running for long and it's not responding properly but related files are in /tmp directory.
Though I don't declare any variable for broadcast but may be spark is doing at its own.
In my case, the error occurs when it is trying to load csv from hdfs.
I have taken low level logs for my application and attached herewith for support and suggestions/best practice so that I can resolve the problem.
Sample (details are Attached here):
Traceback (most recent call last): File "/home/hadoop/development/kpiengine.py", line 258, in df_ho_raw = sqlContext.read.format('com.databricks.spark.csv').options(header='true').load(HDFS_BASE_URL + HDFS_WORK_DIR + filename) File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 147, in load File "/usr/local/spark/python/lib/py4j-0.10.1-src.zip/py4j/java_gateway.py", line 933, in call File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco File "/usr/local/spark/python/lib/py4j-0.10.1-src.zip/py4j/protocol.py", line 312, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o44.load. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, 172.26.7.192): java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_1_piece0 of broadcast_1
You should to extends Serializable for your class
Your code Framework error, you can test it
If it's ok, you should check your code.