Py4JJavaError while trying to get data from Snowflake into Databricks

159 views Asked by At

I am trying to get data from a Snowflake table into a Databricks spark dataframe. Here is my current code.

from pyspark.sql import SparkSession
from pyspark.sql.functions import *
from pyspark.sql.types import *
import pyspark.pandas as pypd
import pandas as pd
import re

options = {
  "sfUrl": sfURL,
  "sfUser": sfUser,
  "sfPassword": sfPassword,
  "sfDatabase": sfDatabase,
  "sfSchema": sfSchema,
  "sfWarehouse": sfWarehouse,
  "sfRole": sfRole
}

pydf = spark.read \
  .format("snowflake") \
  .options(**options) \
  .option("query", 
          $QUERY) \
  .load()

Everything up to this point loads just fine, but when I run pydf.show() to see the table, I get this error:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3) (ip-10-52-231-139.ec2.internal executor driver): net.snowflake.client.jdbc.SnowflakeSQLLoggedException: JDBC driver internal error: Max retry reached for the download of #chunk0 (Total chunks: 2) retry=10, error=net.snowflake.client.jdbc.SnowflakeSQLLoggedException: JDBC driver encountered communication error. Message: Received close_notify during handshake.

And here is a truncated traceback:

Py4JJavaError                             Traceback (most recent call last)
File <command-3910608059704003>, line 1
----> 1 pydf.show()

File /databricks/spark/python/pyspark/instrumentation_utils.py:48, in _wrap_function.<locals>.wrapper(*args, **kwargs)
     46 start = time.perf_counter()
     47 try:
---> 48     res = func(*args, **kwargs)
     49     logger.log_success(
     50         module_name, class_name, function_name, time.perf_counter() - start, signature
     51     )
     52     return res

File /databricks/spark/python/pyspark/sql/dataframe.py:934, in DataFrame.show(self, n, truncate, vertical)
    928     raise PySparkTypeError(
    929         error_class="NOT_BOOL",
    930         message_parameters={"arg_name": "vertical", "arg_type": type(vertical).__name__},
    931     )
    933 if isinstance(truncate, bool) and truncate:
--> 934     print(self._jdf.showString(n, 20, vertical))
    935 else:
    936     try:

File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322, in JavaMember.__call__(self, *args)
   1316 command = proto.CALL_COMMAND_NAME +\
   1317     self.command_header +\
   1318     args_command +\
   1319     proto.END_COMMAND_PART
   1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
   1323     answer, self.gateway_client, self.target_id, self.name)
   1325 for temp_arg in temp_args:
   1326     if hasattr(temp_arg, "_detach"):

File /databricks/spark/python/pyspark/errors/exceptions/captured.py:188, in capture_sql_exception.<locals>.deco(*a, **kw)
    186 def deco(*a: Any, **kw: Any) -> Any:
    187     try:
--> 188         return f(*a, **kw)
    189     except Py4JJavaError as e:
    190         converted = convert_exception(e.java_exception)

File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
    324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    325 if answer[1] == REFERENCE_TYPE:
--> 326     raise Py4JJavaError(
    327         "An error occurred while calling {0}{1}{2}.\n".
    328         format(target_id, ".", name), value)
    329 else:
    330     raise Py4JError(
    331         "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
    332         format(target_id, ".", name, value))

Py4JJavaError: An error occurred while calling o710.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3) (ip-10-52-231-139.ec2.internal executor driver): net.snowflake.client.jdbc.SnowflakeSQLLoggedException: JDBC driver internal error: Max retry reached for the download of #chunk0 (Total chunks: 2) retry=10, error=net.snowflake.client.jdbc.SnowflakeSQLLoggedException: JDBC driver encountered communication error. Message: Received close_notify during handshake.
...
Caused by: javax.net.ssl.SSLProtocolException: Received close_notify during handshake
...
Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:3578)
    at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:3510)
    at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:3499)
...
Caused by: net.snowflake.client.jdbc.SnowflakeSQLLoggedException: JDBC driver internal error: Max retry reached for the download of #chunk0 (Total chunks: 2) retry=10, error=net.snowflake.client.jdbc.SnowflakeSQLLoggedException: JDBC driver encountered communication error. Message: Received close_notify during handshake.
...
Caused by: javax.net.ssl.SSLProtocolException: Received close_notify during handshake
...

My code worked fine three weeks ago, but all of a sudden it just stopped working. I am using i3.4xlarge, and I am using the Databricks runtime version "13.3 LTS (includes Apache Spark 3.4.1, Scala 2.12)", and it is unity catalog enabled and is using photon acceleration. I really, cannot tell what's going wrong here. Do I need to use a more updated compute, or a larger box?

0

There are 0 answers