I would like the data to be masked, but it was possible to understand how many people studied at UNIVERSITY_1.
What de-identification transformation can I use to accomplish such information\text masking?
Input:
{
"students": [
{
"name": "John Smith",
"university": "University of Pennsylvania"
},
{
"formattedName": "Mike Miller",
"university": "Harvard University"
},
{
"formattedName": "Elon Musk",
"university": "University of Pennsylvania"
}
]
}
Output:
{
"students": [
{
"name": "John Smith",
"university": "UNIVERSITY_1"
},
{
"formattedName": "Mike Miller",
"university": "UNIVERSITY_2"
},
{
"formattedName": "Elon Musk",
"university": "UNIVERSITY_1"
}
]
}
You could use crypto hashing: https://cloud.google.com/dlp/docs/deidentify-sensitive-data#cryptohashconfig