Attach a tag from a existing template along with values to a BigQuery table using the Data Catalog

612 views Asked by At

I need help using the template template created in Google Data Catalog through Python programming to tag several BigQuery tables using the same template template created. I don't know how to use the created template tag to attach to tables. I found through this page a sample to create the template and apply it to the table but I would like to use the same template for several tables but I am not able to. I would like some help or guidance on how to do this. Thanks. As a reference I used this quickstart https://cloud.google.com/data-catalog/docs/tag-bigquery-dataset#python

1

There are 1 answers

0
Piotr Zalas On

You should be able to tag multiple BigQuery tables with one tag template, just be aware that each table and each column in the table can be tagged only once with a given tag template (i.e. you can't create two or more tags on the same table that use the same tag template).

Based on the Python code snippet attached to the mentioned quickstart, after creating the tag template you should do the following:

  1. Check which fields of the tag template are required. You must set them in your tagging request. By default none of the fields is required.

  2. For each table that you want to tag:

    1. Prepare BigQuery resource name of the table. In the quickstart it's achieved with the following snippet:

      resource_name = (
          f"//bigquery.googleapis.com/projects/{project_id}"
          f"/datasets/{dataset_id}/tables/{table_id}"
      )
      
    2. Convert BigQuery resource name to Data Catalog Entry name. BigQuery tables are automatically imported to Data Catalog. Unfortunately names of Data Catalog entries that represent BigQuery tables aren't easily deductible from BigQuery resource names. This is why you must call LookupEntry method:

      table_entry = datacatalog_client.lookup_entry(request={"linked_resource": resource_name})
      
    3. Now you create a tag. You do this by specifying:

      • Tag template to be used and tag name visible in the UI:

        tag = datacatalog_v1.types.Tag()
        tag.template = tag_template.name
        tag.name = "my_super_cool_tag"
        
      • Fields in the tag. You must at least specify all required fields. Each field must have name matching some existing field in the tag template. The value of field must have type matching type of field declared in tag template. In this snippet we have field "source" of string type:

        tag.fields["source"] = datacatalog_v1.types.TagField()
        tag.fields["source"].string_value = "Copied from tlc_yellow_trips_2018"
        
      • Calling CreateTag method to tag the table. The returned tag will have some additional output-only fields filled, e.g. unique tag name, so you can't reuse it in subsequent requests:

        tag = datacatalog_client.create_tag(parent=table_entry.name, tag=tag)
        print(f"Created tag: {tag.name}")