Optimize blob storage Deltalake using local scope table on Azure Databricks

Question

Optimize blob storage Deltalake using local scope table on Azure Databricks

464 views Asked by 0vbb At 16 October 2020 at 14:36

How can you optimize an Azure blob storage delta table on Azure Databricks, while not putting the table to a global scope? Optimizing and z-ordering a delta table on an Azure blob storage can be done via (cf. docs):

spark.sql('DROP TABLE IF EXISTS T')
spark.sql("CREATE TABLE T USING DELTA LOCATION
          'wasbs://[email protected]/path/to/df'"
spark.sql('OPTIMIZE T ZORDER BY (colname)')
spark.sql('DROP TABLE IF EXISTS T')

However, the table T has a global scope, such that this command fails, if other users have already labeled a table with the name T.

A possible solution might be the following, but is this the easiest way (why are ` needed and not ')?

spark.sql("OPTIMIZE delta.`wasbs://[email protected]/path/to/df`
           ZORDER BY (colname)")

Original Q&A

There are 1 answers

**Douglas M** · Accepted Answer · 2020-10-22T22:16:26+00:00

Two thoughts:

You can & should scope the table to a database. The example above has 'default' as the database name. Just use MY_DB as an example:

spark.sql("CREATE TABLE MY_DB.T USING DELTA LOCATION
      'wasbs://[email protected]/path/to/df'"

Yes, your suggestion is also correct. The back tics are a weird spark-ism for specifying the LOCATION property of a data set in a select clause.

TechQA.

Optimize blob storage Deltalake using local scope table on Azure Databricks

There are 1 answers

Related Questions in AZURE-DATABRICKS

Related Questions in DELTA-LAKE

Popular Questions

Popular Tags

Trending Questions