I am currently writing/creating an external table in databricks.
To do so, I am using the DatabricksSqlOperator.
I have a function that returns this Dbks Operator with all the needed details in it.
@dag(...)
.
.
.
def create_external():
return DatabricksSqlOperator(...)
However, to improve the quality of the code, I want to query the schema table in dbks before creating the external table (to check for schema changes).
I would like to know if it would be possible to use an operator inside the same function, or if it has to be an extra "task" in my DAG. Since its an intermediate task, it will be very ugly to have everywhere.
@dag(...)
.
.
.
def create_external():
query_result = DatabricksSqlOperator(...)
if query_result:
queries= [query1, query2]
else:
queries= [query1]
return DatabricksSqlOperator(query=queries)
It needs to be a separate task plus a branch operator that are aligned in a sequence like this:
If it is a repeating pattern you can wrap it in a task group:
Notes:
trigger_ruletraps in pipelines containing branches (read here)BranchSqlOperatorandDatabricksSqlOperatoryou might be able to define something likeBranchDatabricksSqlOperatorthat would allow to havetest_queryandbranchas a single task