Can I manually adjust _PARTITIONTIME for historical data ingestion in BigQuery?

170 views Asked by At

I’m tinkering with Google BigQuery and stumbled upon a bit of a head-scratcher regarding the ingestion of some historical data. I've got this dataset that hails from way back in may 2023, and I’m trying to get it into a date-partitioned table in BigQuery.

Now, from what I’ve gathered, _PARTITIONTIME gets set by BigQuery during data ingestion and it’s used for partitioning tables based on when the data was ingested. My pickle is: since the data’s from the history books, I'd like to adjust the _PARTITIONTIME to mirror the original date of the data, not the "just got ingested" date.

So here’s where I need a hand:

Can I modify or manually set _PARTITIONTIME during data ingestion into BigQuery to reflect the actual date of the data rather than the ingestion date? If that’s a no-go, are there any clever workarounds you’d recommend for handling this use case? Just to be clear, I’m really keen on sticking with BigQuery's partitioning mechanism and not adding an extra date/timestamp column to my table.

1

There are 1 answers

2
sarath.mec On

Duplicate of Big Query specify _PARTITIONTIME when inserting from one table into another

Partition in BigQuery can be updated. Google Documentation

Say for eg, I want to change the Partition to Monthly

CREATE TABLE IF NOT EXISTS `schema.data_monthly`
(
  name          STRING,
  id            STRING,
  publish_time  TIMESTAMP
)
PARTITION BY DATE_TRUNC(_PARTITIONTIME, MONTH);

INSERT INTO schema.data_monthly (_PARTITIONTIME, name, id, publish_time) 
SELECT DATE_TRUNC(_PARTITIONTIME, MONTH), name, id, publish_time
FROM schema.daily_data;

Now the partitions would be changed. Also If I try to give a wrong DATE_TRUNC function over WEEK, Big Query Will Error

INSERT INTO els_weather.pub_sub_weather_data_monthly (_PARTITIONTIME, name, id, publish_time) 
SELECT DATE_TRUNC(_PARTITIONTIME, WEEK), subscription_name, message_id, publish_time
FROM els_weather.pub_sub_weather_data

❗ Timestamp pseudo column for partitioned table only supports MONTH (in UTC timezone) granularity values