In my deployment.yaml file I have defined a static cluster as such:
custom:
basic-cluster-props: &basic-cluster-props
spark_version: "11.2.x-scala2.12"
basic-static-cluster: &basic-static-cluster
new_cluster:
<<: *basic-cluster-props
num_workers: 1
node_type_id: "Standard_DS3_v2"
I use this for all of my tasks. In one of the tasks, I save a DataFrame using:
transactions.createOrReplaceGlobalTempView("transactions")
And in another task (which is depended on the previous task), I try to read the temporary view as such:
global_temp_db = session.conf.get("spark.sql.globalTempDatabase")
# Load wallet features
transactions = session.sql(f"""SELECT *
FROM """ + global_temp_db + """.transactions""")
But I get the error:
AnalysisException: Table or view not found: global_temp.transactions; line 2 pos 43;
'Project [*]
+- 'UnresolvedRelation [global_temp, transactions], [], false
Both tasks run within the same SparkSession, so why can it not find my global temp view?
Unfortunately this won't work unless you're using a cluster-reuse feature (otherwise you have a new cluster each time, therefore you won't be able to cross-reference this view).
A more pythonic approach would be to add the code that initializes the view in every task, e.g. if you're using the pre-defined
Task
class:Since view creation is a very cheap operation which doesn't take much time, this is a quite efficient approach.