How to use custom infotypes in data leak prevention's(Google cloud platform) De-identify Template?

672 views Asked by At

I am working on developing PII de identification application using data leak prevention(GCP). I am using de identification template for the de-identification rules.

Issue: I am not able to figure out about using custom info types in the deidentification template.

Here is a sample deidentification template:

{
  "deidentifyTemplate":{
    "displayName":"Email and id masker",
    "description":"De-identifies emails and ids with a series of asterisks.",
    "deidentifyConfig":{
      "infoTypeTransformations":{
        "transformations":[
          {
            "infoTypes":[
              {
                "name":"EMAIL_ADDRESS"
              }
            ],
            "primitiveTransformation":{
              "characterMaskConfig":{
                "maskingCharacter":"*"
              }
            }
          }
        ]
      }
    }
  }
}

In the above example, it a bultin info type(email) and in documentation custom info type snippet is like below:

    "inspect_config":{
      "custom_info_types":[
        {
          "info_type":{
            "name":"CUSTOM_ID"
          },
          "regex":{
            "pattern":"[1-9]{2}-[1-9]{4}"
          },
          "likelihood":"POSSIBLE"
        }
      ]
  }

There is not a valid object definition for inspect_config in rest documentation of deidentification template, its only valid in inspection template.

Is it possible to use custom info types in de identification template(infoTypeTransformations)?

Here is the link for rest documentation.

2

There are 2 answers

0
Arnab Mukherjee On

We can use custom info types in deidentification template using stored info types.

We can create stored info type using API calls and that stored info type can be referenced like a built-in info type.

Creating stored info type

  • Few global variables and dependencies
import google.cloud.dlp
import os

dlp = google.cloud.dlp_v2.DlpServiceClient()
default_project = os.environ['GOOGLE_PROJECT']  # project id
parent = f"projects/{default_project}"

# details of custom info types
custom_info_id = "<unique-id>" # example: IP_ADDRESS
custom_info_id_pattern = r"<regex pattern>"
  • Creating the request payload
info_config = {

    "display_name": custom_info_id,
    "description": custom_info_id,

    "regex":
        {
            "pattern": custom_info_id_pattern
        }
}
  • Making api call
response = dlp.create_stored_info_type(request={
    "parent": parent,
    "config": info_config,
    "stored_info_type_id": custom_info_id
})

How to reference stored infotype

  • use need to use stored_info_type_id in deidentification template for the operation:
          {
            "info_types":[
              {
                "name":"IP_ADDRESS"  # this is defined stored_info_type_id
              }
            ],
            "primitive_transformation":{
              "character_mask_config":{
                "characters_to_ignore":[
                  {
                    "characters_to_skip":"."
                  }
                ],
                "masking_character":"*"
              }
            }
          },
2
Nek On

Yes it is possible to use custom info types. What will need to be done is that you create a De-Identify Template and also an Inspect Template.

Then when you call the API, you send both of the template in as parameters. With python using the dlp client library, here is some sample pseudocode

from google.cloud import dlp_v2

dlp_client = dlp_v2.DlpServiceClient()
dlp_client.deidentify_content(
    request={
        inspect_template_name = "projects/<project>/locations/global/inspectTemplates/<templateId>,
        deidentify_template_name = "projects/<project>/locations/global/deidentifyTemplates/<templateId>,
        parent = <parent>,
        item = <item>
    }
)