Indexing rows in Autoloader when loading from parquet files

25 views Asked by At

Simply saying I have an autoloader doing ingestion of parquet files which contains CDC events. Sometimes commit time for two events is the same and I want to take order of row inside parquet file to be second criteria for ordering events. Basically I need row number but can't use row_number() and monotonically_increasing_id(). Additionally as this is inside DLT I can't rely on foreachBatch().

I know that there is something like METADATA$FILE_ROW_NUMBER in Snowflake, so I'm searching for something similar in Databricks.

0

There are 0 answers