Terraform Script to copy Big query dataset to GCS Bucket

54 views Asked by Umamaheswararao Meka At 25 February 2024 at 07:42

My goal is to transfer BigQuery dataset or tables to Google Cloud Storage (GCS) buckets. While BigQuery Data Transfer is useful for copying data between datasets, it doesn't directly support copying datasets to GCS. I've researched various documentation but couldn't find a clear solution for this task. Although Google Cloud Platform (GCP) documentation explains how to copy tables to GCS, the process for copying datasets is not mentioned.

In my exploration, I found that using Terraform to copy tables individually to GCS is challenging because Terraform doesn't directly provide a way to read tables from datasets in BigQuery. However, I may be mistaken, and there could be alternative methods or tools that offer more efficient solutions for this task. Are there any recommendations or best practices for achieving this data transfer from BigQuery to GCS?

# Resource block for daily backup bucket with 14 days retention policy
resource "google_storage_bucket" "daily_backup_bucket" {
  name     = var.daily_backup_bucket
  location = var.daily_backup_bucket_location  # Different region for daily backup

  lifecycle_rule {
    condition {
      age = var.daily_bucket_age
    }
    action {
      type = var.action_type
    }
  }

  public_access_prevention = var.public_access_prevention
}


# Resource block for weekly backup bucket with 5 weeks retention policy and dual region standard tier
resource "google_storage_bucket" "weekly_backup_standard_bucket" {
  name     = var.weekly_backup_standard_bucket
  location = var.weekly_backup_standard_bucket_location   # Different region for weekly backup
  storage_class = var.storage_class_standard

  lifecycle_rule {
    condition {
      age = var.standard_bucket_age  # 5 weeks * 7 days per week
    }
    action {
      type = var.action_type
    }
  }
# custom_placement_config {
#     data_locations = [
#         var.weekly_backup_standard_bucket_location_one,
#         var.weekly_backup_standard_bucket_location_two
#     ]
# }

  public_access_prevention = var.public_access_prevention
}

# Resource block for weekly backup bucket with 3 months retention policy and dual region nearline tier
resource "google_storage_bucket" "weekly_backup_nearline_bucket" {
  name     = var.weekly_backup_nearline_bucket
  location = var.weekly_backup_nearline_bucket_location  # Different region for weekly backup
  storage_class = var.storage_class_nearline

  lifecycle_rule {
    condition {
      age = var.nearline_bucket_age  # 3 months * 30 days per month
    }
    action {
      type = var.action_type
    }
  }

  custom_placement_config {
    data_locations = [
        var.weekly_backup_nearline_bucket_location_one,
        var.weekly_backup_nearline_bucket_location_two
    ]
}

  public_access_prevention = var.public_access_prevention
}

I want to do something like this if possible ( this code is not valid), but not sure how to do it?

resource "google_storage_bucket_object" "exported_tables" {

  for_each = { for table in data.google_bigquery_dataset.get_kebron_dataset.id : table.table_id => table }

  name    = "backup_${formatdate("YYYYMMDD", timestamp())}/${each.value.table_id}.avro"
  bucket  = var.daily_backup_bucket
  source  = each.value.export_format_options[0].uri
  content = ""
}

here is my terraform.tfvars file

# General settings
project     = "melodic-map-408207"
region      = "us-central1"
credentials = "credentials.json"

#config for all buckets
action_type              = "Delete"
public_access_prevention = "enforced"

#daily backup bucket config
daily_backup_bucket          = "kebron_daily_backup_bucket"
daily_backup_bucket_location = "us-west1"
daily_bucket_age             = 14

#weekly backup standard bucket config
weekly_backup_standard_bucket              = "kebron_weekly_backup_standard_bucket"
storage_class_standard                     = "STANDARD"
standard_bucket_age                        = 35
weekly_backup_standard_bucket_location     = "NAM4"
weekly_backup_standard_bucket_location_one = "US-CENTRAL1"
weekly_backup_standard_bucket_location_two = "US-EAST1"

#weekly backup nearline bucket config
weekly_backup_nearline_bucket              = "kebron_weekly_backup_nearline_bucket"
storage_class_nearline                     = "NEARLINE"
nearline_bucket_age                        = 90
weekly_backup_nearline_bucket_location     = "EU"
weekly_backup_nearline_bucket_location_one = "EUROPE-WEST1"
weekly_backup_nearline_bucket_location_two = "EUROPE-WEST4"

#BigQuery Dataset Config
dataset_id            = "kebron_dataset"
dataset_friendly_name = "kebron Dataset"
dataset_description   = "This is an example dataset."

Original Q&A

TechQA.

Terraform Script to copy Big query dataset to GCS Bucket

There are 0 answers

Related Questions in SQL

Related Questions in GOOGLE-CLOUD-PLATFORM

Related Questions in GOOGLE-BIGQUERY

Related Questions in TERRAFORM

Related Questions in TERRAFORM-PROVIDER-GCP

Popular Questions

Trending Questions