check if delta table exists on a path or not in databricks

33.7k views Asked by At

I need to delete certain data from a delta-lake table before I load it. I am able to delete the data from delta table if it exists but it fails when the table does not exist.

Databricks scala code below

// create delete statement
val del_ID = "Check_ID =" + "123"

// get delta table from path where data exists
val deltaTable = DeltaTable.forPath(spark, path)

// delete data from delta table
deltaTable.delete(del_ID)

The above code works only if the delta data exists on that path otherwise it fails.

Can someone share an approach where the delete statement is executed if the delta data exists else the delete statement is ignored ?

3

There are 3 answers

0
Vincent Doba On BEST ANSWER

According to the DeltaTable's Javadoc, you can check that there is a delta table in specified path with the following command:

DeltaTable.isDeltaTable(spark, "path/to/table")

If the path does not contain delta table or doesn't exist, it will return false. So your code would be:

val del_ID = "Check_ID ="+ "123" 
if (DeltaTable.isDeltaTable(spark, path)) {
  DeltaTable.forPath(spark, path).delete(del_ID)
}
0
Ina K On

You can check whether the table exists in the information_schema of the catalog. The result of this query (see below result_df) would be either an empty table (if such a table doesn't exist), or a table with a row. After that you apply the logic with the "if" - if the result_df is not empty - do something, drop the table, etc, otherwise do something else.

If you use the code below directly, don't forget to replace "some_catalog_name" with your catalog name and "the table you check if exists" with your table name.

result_df = spark.sql(f'SELECT * FROM some_catalog_name.information_schema.tables WHERE table_name = "the table you check if exists"')

if not (result_df.isEmpty()):
    do something ...
else:
    do something else ...
1
AidinZadeh On

Other way around in python using Spark SQL API:

spark.sql(f'describe detail {path}').collect()[0].asDict()['format'] == 'delta'

This comes handy when you are not able use the Delta API, as in databricks-connect.