REGEXP_REPLACE And NOT

Asked by At

I wonder whether someone could help me please.

I'm trying to put together a REGEXP_REPLACE query which replaces data but only if it the string matches a certain condition.

This is the query I've come up with:

SELECT
  #select all fields excluding those under the hits record
  * EXCEPT (hits),
  #start array - this rebuilds the hit record
  ARRAY(
  SELECT
    #unnest the hit field, select each field excluding those under the page record
    AS STRUCT * EXCEPT (page),
    (
    SELECT
      #select all page fields excluding pageTitle
      AS STRUCT page.* EXCEPT (pagePath),
      #remove the query parameter from the pagePath fields
      REGEXP_REPLACE(page.pagePath, r'\/invitations\/([a-zA-Z0-9]{8})\/', '/invitations/([a-zA-Z0-9]{8})/redacted') AS pagePath) AS page
  WHERE
    AND NOT page.pagePath= (r'\/invitations\/[a-zA-Z0-9]{8}\/(ltd|limited|co|business')
  FROM
    UNNEST(hits) ) AS hits
FROM
  `Test.Test.ga_sessions_20190401`

The query doesn't work and the part that I'm struggling with is the WHERE NOT. I've tried using AND NOT REGEXP_MATCH in addition to the above, but I can't get this to work.

I just wonder whether someone could take a look at this please and offer some guidance how to solve this?

Many thanks and kind regards

Chris

2 Answers

1
Tamir Klein On Best Solutions

I just wonder whether someone could take a look at this please and offer some guidance on how to solve this?

Two issues in your code:

  1. FROM was after the WHERE
  2. Extra AND after the WHERE

This is the correct Sql:

SELECT
  #select all fields excluding those under the hits record
  * EXCEPT (hits),
  #start array - this rebuilds the hit record
  ARRAY(
  SELECT
    #unnest the hit field, select each field excluding those under the page record
    AS STRUCT * EXCEPT (page),
    (
    SELECT
      #select all page fields excluding pageTitle
      AS STRUCT page.* EXCEPT (pagePath),
      #remove the query parameter from the pagePath fields
      REGEXP_REPLACE(page.pagePath, r'\/invitations\/([a-zA-Z0-9]{8})\/', '/invitations/([a-zA-Z0-9]{8})/redacted') AS pagePath) AS page
  FROM
    UNNEST(hits) AS hits
  WHERE 
    NOT page.pagePath= (r'\/invitations\/[a-zA-Z0-9]{8}\/(ltd|limited|co|business')
    )
FROM
  `Test.Test.ga_sessions_20190401`
1
Mikhail Berlyant On

Below is for Bigquery Standard SQL. The benefit of below solution is that it does not change the structure of underlying table and just does replacement as needed

#standardSQL
SELECT * REPLACE(
  ARRAY(
    SELECT AS STRUCT * REPLACE(
      (SELECT
        AS STRUCT page.* REPLACE(
        REGEXP_REPLACE(page.pagePath, r'\/invitations\/([a-zA-Z0-9]{8})\/', '/invitations/([a-zA-Z0-9]{8})/redacted') AS pagePath)
      ) AS page)
    FROM UNNEST(hits) AS hits
    WHERE NOT page.pagePath= (r'\/invitations\/[a-zA-Z0-9]{8}\/(ltd|limited|co|business')
  ) AS hits)
FROM `Test.Test.ga_sessions_20190401`   

Note using SELECT * REPLACE instead of SELECT * EXCEPT