I am looking for a google native option for data quality and went through Dataplex in GCP world. However there are 2 ways to define data quality rules in Dataplex - i) via process & ii) via Govern.
What is the difference between Dataplex Data Quality task (Present under Dataplex > Manage Lakes > Process) and Dataplex Data Quality Scan (Present under Dataplex > Govern > Data Quality) ?
The first one seems to be entirely user defined and based on YML configuration file and the output will be stored on Bigquery.
The second one is more drag & drop , out of the box data quality feature with built in data quality rules as well as we can write our own rules. But the results are not stored in Big Query.
Is there any other difference ? Definitely not having results stored in Big Query is a major drawback , but OOTB features are always an add-on.
Any thoughts on this ?
The data quality task (under 'Process') is more of a DIY solution whereas the AutoDQ (under 'Govern') is a fully managed solution.
The AutoDQ (in addition to Data Profiling) has been under public preview for about 6 months and is expected to be Generally Available in Q3 2023.
Last week, AutoDQ (and Data profiling) added an ability to export the results to your own BigQuery table.