pyspark error while writing data to existing hive table

93 views Asked by At

I am getting below error when i am trying to write data to my existing hive table. insertinto works fine when i am processing small size data but in the month end when data get bigger in size this error occurs everyday. My pyspark executor/driver memory config is:

.config('spark.executor.cores', '2') \
        .config("spark.cores.max", '10') \
        .config("spark.executor.memory", '20g') \
        .config("spark.driver.memory", '50g')\



combined_df.write.format("org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat").insertInto("process.sample_table", overwrite="True")
  File "/usr/local/lib/python3.11/site-packages/pyspark/sql/readwriter.py", line 1448, in insertInto
    self._jwrite.insertInto(tableName)
  File "/usr/local/lib/python3.11/site-packages/py4j/java_gateway.py", line 1322, in __call__
    return_value = get_return_value(
                   ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pyspark/errors/exceptions/captured.py", line 169, in deco
    return f(*a, **kw)
           ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/py4j/protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError23/12/02 00:16:34 DEBUG OutputCommitCoordinator: Commit denied for stage=1137.0, partition=1: stage already marked as completed.
: An error occurred while calling o929.insertInto.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 24 in stage 1137.0 failed 4 times, most recent failure: Lost task 24.3 in stage 1137.0 (TID 141976) (10.241.253.154 executor 9): org.apache.spark.SparkException: 
Error from python worker:
  Exception ignored error evaluating path:
  Traceback (most recent call last):
    File "<frozen getpath>", line 660, in <module>
  OSError: failed to make path absolute
  Fatal Python error: error evaluating path
  Python runtime state: core initialized
  
  Current thread 0x00007fa2e0563200 (most recent call first):
    <no Python frame>
PYTHONPATH was:
  /opt/spark-3.4.1-bin-hadoop3/python/lib/pyspark.zip:/opt/spark-3.4.1-bin-hadoop3/python/lib/py4j-0.10.9.7-src.zip:/opt/spark-3.4.1-bin-hadoop3/jars/spark-core_2.12-3.4.1.jar:/opt/spark-3.4.1-bin-hadoop3/python/lib/py4j-0.10.9.7-src.zip:/opt/spark-3.4.1-bin-hadoop3/python:
java.lang.NullPointerException
0

There are 0 answers