I'm using spark 2.4.4 with AWS glue catalog.
In my spark job, I need to create a database in glue if it doesn't exist. I'm using the following statement in spark sql to do so.
spark.sql("CREATE DATABASE IF NOT EXISTS %s".format(hiveDatabase));
It works as expected in spark-shell
, a database gets create in Glue.
But when I run the same piece of code using spark-submit
, then the database is not created. Is there a commit/flush that I need to do when using spark-submit?
EDIT
I'm getting different results for show databases
in spark-shell
and spark-submit
:
+---------------------+
|databaseName |
+---------------------+
|all |
|default |
|hive-db |
|navi-database-account|
|navi-par |
|testdb |
+---------------------+
+------------+
|databaseName|
+------------+
|default |
+------------+
Looks like spark-submit is creating the DB somewhere, but not in glue.
Needed to add following config: