I am trying to write dataframe data into a table in Azure SQL from Databricks using pyspark. Table : dbo.test already exists in the database. I am able to read it before I execute below write operation.
testDf.write.format("com.microsoft.sqlserver.jdbc.spark").mode("overwrite")\
.option("truncate", "false")\
.option("url", azure_sql_url).option("dbtable", 'dbo.test')\
.option("databaseName", database_name)\
.option("user", username) \
.option("password", password) \
.option("encrypt", "true")\
.option("hostNameInCertificate", "*.database.windows.net")\
.option("bulkCopyBatchSize", 10000).option("bulkCopyTableLock", "true")\
.option("bulkCopyTimeout", "6000000")\
.save()
After executing this command the following error is returned:
java.lang.NoSuchMethodError: org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.schemaString(Lorg/apache/spark/sql/Dataset;Ljava/lang/String;Lscala/Option;)Ljava/lang/String;
Surprisingly, the dbo.test table gets deleted.
Can someone help me understand why this is happening. Same code works fine in another environment.
The cause of error is version mismatch between the Spark SQL and JDBC spark connector.Also, The
mode("overwrite")drops the table if already exists by default and re-creates a new one.Here are some supported versions of JDBC spark connector ith respective spark version:
Resolution:
If you are using Databricks runtime version greater than 10 then you need to downgrade it to Databricks runtime version 9.1 LTS or down. and use respective connector.
I downgraded Databricks runtime version to 7.3 LTS:
Then installed appropriate library for spark 3.0.x which is
com.microsoft.azure:spark-mssql-connector_2.12:1.1.0And tried your code its working fine.
My execution:
OUTPUT:
Or you can directly use
dataframename.format("jdbc")Sample CODE:
You can also refer similar issue here on Github or this similar SO thread
Reference: https://learn.microsoft.com/en-us/sql/connect/spark/connector?view=sql-server-ver16