My goal is to transfer BigQuery dataset or tables to Google Cloud Storage (GCS) buckets. While BigQuery Data Transfer is useful for copying data between datasets, it doesn't directly support copying datasets to GCS. I've researched various documentation but couldn't find a clear solution for this task. Although Google Cloud Platform (GCP) documentation explains how to copy tables to GCS, the process for copying datasets is not mentioned.
In my exploration, I found that using Terraform to copy tables individually to GCS is challenging because Terraform doesn't directly provide a way to read tables from datasets in BigQuery. However, I may be mistaken, and there could be alternative methods or tools that offer more efficient solutions for this task. Are there any recommendations or best practices for achieving this data transfer from BigQuery to GCS?
# Resource block for daily backup bucket with 14 days retention policy
resource "google_storage_bucket" "daily_backup_bucket" {
name = var.daily_backup_bucket
location = var.daily_backup_bucket_location # Different region for daily backup
lifecycle_rule {
condition {
age = var.daily_bucket_age
}
action {
type = var.action_type
}
}
public_access_prevention = var.public_access_prevention
}
# Resource block for weekly backup bucket with 5 weeks retention policy and dual region standard tier
resource "google_storage_bucket" "weekly_backup_standard_bucket" {
name = var.weekly_backup_standard_bucket
location = var.weekly_backup_standard_bucket_location # Different region for weekly backup
storage_class = var.storage_class_standard
lifecycle_rule {
condition {
age = var.standard_bucket_age # 5 weeks * 7 days per week
}
action {
type = var.action_type
}
}
# custom_placement_config {
# data_locations = [
# var.weekly_backup_standard_bucket_location_one,
# var.weekly_backup_standard_bucket_location_two
# ]
# }
public_access_prevention = var.public_access_prevention
}
# Resource block for weekly backup bucket with 3 months retention policy and dual region nearline tier
resource "google_storage_bucket" "weekly_backup_nearline_bucket" {
name = var.weekly_backup_nearline_bucket
location = var.weekly_backup_nearline_bucket_location # Different region for weekly backup
storage_class = var.storage_class_nearline
lifecycle_rule {
condition {
age = var.nearline_bucket_age # 3 months * 30 days per month
}
action {
type = var.action_type
}
}
custom_placement_config {
data_locations = [
var.weekly_backup_nearline_bucket_location_one,
var.weekly_backup_nearline_bucket_location_two
]
}
public_access_prevention = var.public_access_prevention
}
I want to do something like this if possible ( this code is not valid), but not sure how to do it?
resource "google_storage_bucket_object" "exported_tables" {
for_each = { for table in data.google_bigquery_dataset.get_kebron_dataset.id : table.table_id => table }
name = "backup_${formatdate("YYYYMMDD", timestamp())}/${each.value.table_id}.avro"
bucket = var.daily_backup_bucket
source = each.value.export_format_options[0].uri
content = ""
}
here is my terraform.tfvars file
# General settings
project = "melodic-map-408207"
region = "us-central1"
credentials = "credentials.json"
#config for all buckets
action_type = "Delete"
public_access_prevention = "enforced"
#daily backup bucket config
daily_backup_bucket = "kebron_daily_backup_bucket"
daily_backup_bucket_location = "us-west1"
daily_bucket_age = 14
#weekly backup standard bucket config
weekly_backup_standard_bucket = "kebron_weekly_backup_standard_bucket"
storage_class_standard = "STANDARD"
standard_bucket_age = 35
weekly_backup_standard_bucket_location = "NAM4"
weekly_backup_standard_bucket_location_one = "US-CENTRAL1"
weekly_backup_standard_bucket_location_two = "US-EAST1"
#weekly backup nearline bucket config
weekly_backup_nearline_bucket = "kebron_weekly_backup_nearline_bucket"
storage_class_nearline = "NEARLINE"
nearline_bucket_age = 90
weekly_backup_nearline_bucket_location = "EU"
weekly_backup_nearline_bucket_location_one = "EUROPE-WEST1"
weekly_backup_nearline_bucket_location_two = "EUROPE-WEST4"
#BigQuery Dataset Config
dataset_id = "kebron_dataset"
dataset_friendly_name = "kebron Dataset"
dataset_description = "This is an example dataset."