I am getting below error when i am trying to write data to my existing hive table. insertinto works fine when i am processing small size data but in the month end when data get bigger in size this error occurs everyday. My pyspark executor/driver memory config is:
.config('spark.executor.cores', '2') \
.config("spark.cores.max", '10') \
.config("spark.executor.memory", '20g') \
.config("spark.driver.memory", '50g')\
combined_df.write.format("org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat").insertInto("process.sample_table", overwrite="True")
File "/usr/local/lib/python3.11/site-packages/pyspark/sql/readwriter.py", line 1448, in insertInto
self._jwrite.insertInto(tableName)
File "/usr/local/lib/python3.11/site-packages/py4j/java_gateway.py", line 1322, in __call__
return_value = get_return_value(
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pyspark/errors/exceptions/captured.py", line 169, in deco
return f(*a, **kw)
^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/py4j/protocol.py", line 326, in get_return_value
raise Py4JJavaError(
py4j.protocol.Py4JJavaError23/12/02 00:16:34 DEBUG OutputCommitCoordinator: Commit denied for stage=1137.0, partition=1: stage already marked as completed.
: An error occurred while calling o929.insertInto.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 24 in stage 1137.0 failed 4 times, most recent failure: Lost task 24.3 in stage 1137.0 (TID 141976) (10.241.253.154 executor 9): org.apache.spark.SparkException:
Error from python worker:
Exception ignored error evaluating path:
Traceback (most recent call last):
File "<frozen getpath>", line 660, in <module>
OSError: failed to make path absolute
Fatal Python error: error evaluating path
Python runtime state: core initialized
Current thread 0x00007fa2e0563200 (most recent call first):
<no Python frame>
PYTHONPATH was:
/opt/spark-3.4.1-bin-hadoop3/python/lib/pyspark.zip:/opt/spark-3.4.1-bin-hadoop3/python/lib/py4j-0.10.9.7-src.zip:/opt/spark-3.4.1-bin-hadoop3/jars/spark-core_2.12-3.4.1.jar:/opt/spark-3.4.1-bin-hadoop3/python/lib/py4j-0.10.9.7-src.zip:/opt/spark-3.4.1-bin-hadoop3/python:
java.lang.NullPointerException