Azure Data Governance Solution approach for Data Lakes

194 views Asked by At

I am evaluating how to implement a Data Governance solution with Azure Data Catalogue for a Data Lake batch transformation pipeline. Below is my approach to it. Any insights please?

  1. Data Factory can't capture the lineage from source to Data Lake.
  2. I know Data Catalogue can't not maintain business rules for data curation on the Data Lake.
  3. First the data feed is onboard manually from Azure Data Catalogue under a given business glossary, etc. Or When raw data feed is ingested into Data Lake Storage, the asset to be created automatically under a given business glossary (if it does not exists).
  4. The raw data is cleaned, classified and tagged during a light transformation on the lake. Thus, related tags needs to be created on Data Catalogue. (this is custom coding calling Azure Data Catalogue REST API's)
  5. Then, there is ETL processing. New data assets to be created with tagging in Data Catalogue. The tools are Spark based. (this is custom coding calling Azure Data Catalogue REST API's) Finally, Data Catalogue will display all data assets created in Data Lake batch transformation data pipeline under specific business glossary with the right tags.
  6. I am skipping Operational meta-data and full lineage as there is no such solution in Azure offerings. this needs to be custom solution again.

I am looking for the best practice. Appreciate your thoughts.

Many thanks

Cengiz

0

There are 0 answers