I'm performing a SCD-Type 2 logic in BigQuery and I need to figure out how to update the end date of an old record that's been updated comparing MD5s

1.2k views Asked by At

Below is an example of my query I have so far. I'm using MD5 hash functions to compare the two tables for updates. I want to be able to update the end_date column in my old record to the current_timestamp when there's an update. And the updated record to to have an end_date of '9999-12-31 23:59:59'

INSERT INTO target_table
SELECT 
a.user_id,
CURRENT_TIMESTAMP() AS start_date,
CAST('9999-12-31 23:59:59' AS TIMESTAMP) AS end_date,
a.first_name
a.md5_cd

FROM
(SELECT * FROM source_table)a
LEFT OUTER JOIN
(SELECT * FROM target_table)b

ON a.md5_cd = b.md5_cd
AND a.user_id = b.user_id
WHERE b.md5_cd IS NULL

Can anyone tell me what needs to be added? I'm trying to avoid using a merge statement. The target_table updates the table with a new record when the md5_cd doesn't match but it doesn't change the end_date to the current timestamp.

This is the result I want if I changed the value of Dan to Pat. enter image description here

but I'm getting this. enter image description here

Here's my source table query:

INSERT INTO source_table
SELECT
A.user_id,
CURRENT_TIMESTAMP() as start_date
CAST('9999-12-31 23:59:59' AS TIMESTAMP) AS end_date,
A.first_name,
to_hex(md5(A.first_name)) as md5_cd

FROM
(
 SELECT
     '1' AS user_id,
     'Dan' as first_name

 )A;
0

There are 0 answers