How to remove PII from URL (GA4 w/ GTM)

3.5k views Asked by At

PII (Personally Identifiable Information) should never be sent to Google Analytics, not only it breaches GA Terms of Use, but you also leaking sensitive user data. So how to remove PII from URL, such as query string params (email, userId, ...) or even from location path when using Google Tag Manager (GTM) and Google Analytics 4 (GA4)?

1

There are 1 answers

10
petriq On

Let's assume you've got already set up GA4 property and GTM installed on your page.

So let's create new tag for GA4 configuration. As Measurement ID I use lookup table variable (it's perfect when you've got multiple environments like testing, staging, production - all those have separate Measurement ID, but uses same GTM install script), but you can just simply write your G-XXXXXXXXX Measurement ID here. Then expand Fields to Set section, add page_location as Field Name and click on lego button next to Value. enter image description here

Click on + (plus button) in upper right corner to add new variable. enter image description here

As a Variable Type choose Custom JavaScript. In upper left corner write name of your new variable, I used Redacted Page Location. enter image description here

And now we are getting closer to how to remove PII. In Custom JavaScript section insert JS function which should return redacted URL. Mine function uses regular expressions to replace PII from URL with some redacted text. Parameters I wanted to redact from url path are IDs of company, project, epic, and task; and userId from query params.

function() {
  var url = window.location.toString();
  var filter = [
    {
      rx: /company\/\d+/g,
      replacement: 'company/REDACTED_COMPANY_ID'
    },
    {
      rx: /projects\/\d+/g,
      replacement: 'projects/REDACTED_PROJECT_ID'
    },
    {
      rx: /epics\/\d+/g,
      replacement: 'epics/REDACTED_EPIC_ID'
    },
    {
      rx: /tasks\/\d+/g,
      replacement: 'tasks/REDACTED_TASK_ID'
    },
    {
      rx: /userId=\d+/g,
      replacement: 'userId=REDACTED_USER_ID'
    },
  ];
  
  filter.forEach(function(item) {
    url = url.replace(item.rx, item.replacement);
  });
  
  return url;
}

Let's say the URL of my page is https://www.example.com/company/2247/projects/2114/epics/19258/tasks/19259?userId=1234567, this function redacts it to https://www.example.com/company/REDACTED_COMPANY_ID/projects/REDACTED_PROJECT_ID/epics/REDACTED_EPIC_ID/tasks/REDACTED_TASK_ID?userId=REDACTED_USER_ID.

Select newly added custom variable, it's name should be in Value field, and save your GA4 tag. enter image description here.

Now let's test it. Switch to Preview mode and open your web site. In GA head to Debug View of your GA4 property, wait for page_view to pop up in timeline (maybe you will have to reload you page again), click on it and expand page_location variable. You should see your redacted URL. enter image description here

That's all, enjoy!