google-cloud-platform - 使用 terraform 配置 bigquery 数据集

标签 google-cloud-platform google-bigquery terraform terraform-provider-gcp

我是 GCP 和 Terraform 的新手。我正在开发 terraform 脚本来提供大约 50 个 BQ 数据集,每个数据集至少有 10 个表。所有表都没有相同的架构。

我已经开发了脚本来创建数据集和表格,但我面临着向表格添加模式的挑战,我需要帮助。我正在使用 terraform 变量来构建脚本。

这是我的代码。我需要集成逻辑来为表创建模式。

变体.tf

variable "test_bq_dataset" {
  type = list(object({
    id       = string
    location = string
  }))
}

variable "test_bq_table" {
  type = list(object({
    dataset_id = string
    table_id   = string
  }))
}

terraform.tfvars

test_bq_dataset = [{
  id       = "ds1"
  location = "US"
  },
  {
    id       = "ds2"
    location = "US"
  }
]

test_bq_table = [{
  dataset_id = "ds1"
  table_id   = "table1"
  },
  {
    dataset_id = "ds2"
    table_id   = "table2"
  },
  {
    dataset_id = "ds1"
    table_id   = "table3"
  }
]

主.tf

resource "google_bigquery_dataset" "dataset" {
  count      = length(var.test_bq_dataset)
  dataset_id = var.test_bq_dataset[count.index]["id"]
  location   = var.test_bq_dataset[count.index]["location"]
  labels = {
    "environment" = "development"
  }
}


resource "google_bigquery_table" "table" {
  count = length(var.test_bq_table)
  dataset_id = var.test_bq_table[count.index]["dataset_id"]
  table_id   = var.test_bq_table[count.index]["table_id"]
  labels = {
    "environment" = "development"
  }
  depends_on = [
    google_bigquery_dataset.dataset,
  ]
}

我尝试了所有可能性来为数据集中的表创建模式。然而都没有用。

最佳答案

大概你所有的表都应该有相同的模式......

我会尝试这种方式

资源“google_bigquery_table”“表”

例如,您可以在标签之后添加:

schema = file("${path.root}/subdirectories-path/table_schema.json")

哪里

  • ${path.root} - 是您生成地形文件的地方
  • subdirectories-path - 零个或多个子目录
  • table_schema.json - 带有模式的 json 文件

==> 2021 年 2 月 14 日更新

根据请求显示表架构不同的示例...对原始问题进行最少的修改。

变量.tf

variable "project_id" {
  description = "The target project"
  type        = string
  default     = "ishim-sample"
}

variable "region" {
  description = "The region where resources are created => europe-west2"
  type        = string
  default     = "europe-west2"
}

variable "zone" {
  description = "The zone in the europe-west region for resources"
  type        = string
  default     = "europe-west2-b"
}

# ===========================
variable "test_bq_dataset" {
  type = list(object({
    id       = string
    location = string
  }))
}

variable "test_bq_table" {
  type = list(object({
    dataset_id = string
    table_id   = string
    schema_id  = string
  }))
}

terraform.tfvars

test_bq_dataset = [
  {
    id       = "ds1"
    location = "EU"
  },
  {
    id       = "ds2"
    location = "EU"
  }
]

test_bq_table = [
  {
    dataset_id = "ds1"
    table_id   = "table1"
    schema_id  = "table-schema-01.json"
  },
  {
    dataset_id = "ds2"
    table_id   = "table2"
    schema_id  = "table-schema-02.json"
  },
  {
    dataset_id = "ds1"
    table_id   = "table3"
    schema_id  = "table-schema-03.json"
  },
  {
    dataset_id = "ds2"
    table_id   = "table4"
    schema_id  = "table-schema-04.json"
  }
]

json 架构文件的示例 - table-schema-01.json

[
  {
    "name": "table_column_01",
    "mode": "REQUIRED",
    "type": "STRING",
    "description": ""
  },
  {
    "name": "_gcs_file_path",
    "mode": "REQUIRED",
    "type": "STRING",
    "description": "The GCS path to the file for loading."
  },
  {
    "name": "_src_file_ts",
    "mode": "REQUIRED",
    "type": "TIMESTAMP",
    "description": "The source file modification timestamp."
  },
  {
    "name": "_src_file_name",
    "mode": "REQUIRED",
    "type": "STRING",
    "description": "The file name of the source file."
  },
    {
    "name": "_firestore_doc_id",
    "mode": "REQUIRED",
    "type": "STRING",
    "description": "The hash code (based on the file name and its content, so each file has a unique hash) used as a Firestore document id."
  },
  {
    "name": "_ingested_ts",
    "mode": "REQUIRED",
    "type": "TIMESTAMP",
    "description": "The timestamp when this record was processed during ingestion into the BigQuery table."
  }
]

ma​​in.tf

provider "google" {
  project = var.project_id
  region  = var.region
  zone    = var.zone
}

resource "google_bigquery_dataset" "test_dataset_set" {
  project    = var.project_id
  count      = length(var.test_bq_dataset)
  dataset_id = var.test_bq_dataset[count.index]["id"]
  location   = var.test_bq_dataset[count.index]["location"]

  labels = {
    "environment" = "development"
  }
}

resource "google_bigquery_table" "test_table_set" {
  project    = var.project_id
  count      = length(var.test_bq_table)
  dataset_id = var.test_bq_table[count.index]["dataset_id"]
  table_id   = var.test_bq_table[count.index]["table_id"]
  schema     = file("${path.root}/bq-schema/${var.test_bq_table[count.index]["schema_id"]}")

  labels = {
    "environment" = "development"
  }
  depends_on = [
    google_bigquery_dataset.test_dataset_set,
  ]
}

项目目录结构 - 截图

请记住子目录名称 - “bq-schema”,因为它用于“main.tf”文件中“google_bigquery_table”资源的“schema”属性。

BigQuery 控制台 - 屏幕截图

“terraform apply”命令的结果。

关于google-cloud-platform - 使用 terraform 配置 bigquery 数据集,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66172075/

相关文章:

amazon-web-services - 如何在 Terraform 中为 Amazon ECR 创建 IAM 角色?

node.js - 谷歌云功能 : support for Google Cloud KMS

machine-learning - 初始 kmeans 点如何在 BigQuery ML 中发挥作用?

sql - GBQ 窗函数 AND 算术运算

terraform - 如何使用 Terraform 连接到 OCI?

terraform - 遍历 map 以获取文件功能

google-cloud-platform - Dataproc YARN 容器日志位置

kubernetes - 带有 'roles/container.admin'的Google云服务帐户

firebase - 如何取消 Google 云平台项目与 Firebase 的链接

json - 如何在bigquery中使用group_concat生成json字符串?