Adding column metadata comments in delta live table

83 views Asked by At

I am working on creating delta live table and I want to add columns based metadata comments.

Below is my code:

  @dlt.table(
    comment = "Flattened  table for Student data",
    name = 'Flattened_table'
  )
 def flatten(): 
    df=spark.readStream.format("delta").load("url")
    column_descriptions_dict={"colname1":"comment for colname 1", "colname2":"comment for 
    colname 2"}
    for field in df.schema.fields:
     df = df.withColumn(field.name, col(field.name).alias(field.name, metadata={"comment": 
     column_descriptions_dict[field.name]}))
    
    return df

But once I check the dlt table , I do not see any comments(metadata) for my columns.

Does the dlt table not take pyspark metadata into consideration ?

1

There are 1 answers

0
JayashankarGS On

You need to add the schema in @dlt.table definition like below:

import dlt
from pyspark.sql.functions import *
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, FloatType

from delta.tables import *
from pyspark.sql.functions import *

deltaTable = DeltaTable.forPath(spark, "dbfs:/dlt_temp/tables/clickstream_raw")
fields = deltaTable.toDF().schema.fields

column_descriptions_dict = {
    "state": "state of store",
    "store_id": "store id",
    "product_category": "product cat",
    "SKU": "SKU",
    "price": "total price"
}

new_sc = []
for field in fields:
    sc = StructField(field.name, field.dataType, True, {'comment': column_descriptions_dict[field.name]})
    new_sc.append(sc)

new_schema = StructType(new_sc)

@dlt.table(
    comment="Flattened table for Student data",
    name='flatten_student',
    schema=new_schema
)
def flatten():
    df = spark.readStream.format("delta").load("dbfs:/dlt_temp/tables/clickstream_raw")
    return df

Then you will get the comments correctly.

Output:

Enter image description here