Optimize blob storage Deltalake using local scope table on Azure Databricks

445 views Asked by At

How can you optimize an Azure blob storage delta table on Azure Databricks, while not putting the table to a global scope? Optimizing and z-ordering a delta table on an Azure blob storage can be done via (cf. docs):

spark.sql('DROP TABLE IF EXISTS T')
spark.sql("CREATE TABLE T USING DELTA LOCATION
          'wasbs://[email protected]/path/to/df'"
spark.sql('OPTIMIZE T ZORDER BY (colname)')
spark.sql('DROP TABLE IF EXISTS T')

However, the table T has a global scope, such that this command fails, if other users have already labeled a table with the name T.

A possible solution might be the following, but is this the easiest way (why are ` needed and not ')?

spark.sql("OPTIMIZE delta.`wasbs://[email protected]/path/to/df`
           ZORDER BY (colname)")
1

There are 1 answers

0
Douglas M On BEST ANSWER

Two thoughts:

  1. You can & should scope the table to a database. The example above has 'default' as the database name. Just use MY_DB as an example:
spark.sql("CREATE TABLE MY_DB.T USING DELTA LOCATION
      'wasbs://[email protected]/path/to/df'"
  1. Yes, your suggestion is also correct. The back tics are a weird spark-ism for specifying the LOCATION property of a data set in a select clause.