google-bigquery - 大查询 : Concatenate two arrays and keep distinct values within MERGE statement

标签 google-bigquery concatenation etl array-merge sql-merge

我正在处理 MERGE 过程并使用新数据更新数组字段但前提是尚未在数组中找到该值。

target table
+-----+----------+
| id  |  arr_col |
+-----+----------+
| a   |  [1,2,3] |
| b   |    [0]   |
+-----+----------+

source table
+-----+----------+
| id  |  arr_col |
+-----+----------+
| a   | [3,4,5] |
| b   |  [0,0]   |
+-----+----------+

target table post-merge
+-----+-------------+
| id  |  arr_col    |
+-----+-------------+
| a   | [1,2,3,4,5] |
| b   |    [0]      |
+-----+-------------+

我正在尝试使用 SQL on this answer在我的 MERGE 语句中

merge into target t
using source
  on target.id = source.id
when matched then
update set target.arr_col = array(
                             select distinct x 
                             from unnest(array_concat(target.arr_col, source.arr_col)) x
                            )

但 BigQuery 向我显示以下错误: 相关子查询在 UPDATE 子句中不受支持。

有没有其他方法可以通过 MERGE 更新这个数组字段？目标表和源表可能非常大并且会每天运行。所以我希望这是一个增量更新的过程，而不是每次都用新数据重新创建整个表。

最佳答案

以下是 BigQuery 标准 SQL

merge into target
using (
  select id, 
    array(
      select distinct x 
      from unnest(source.arr_col || target.arr_col) as x
      order by x
    ) as arr_col
  from source 
  join target
  using(id)
) source
  on target.id = source.id
when matched then
update set target.arr_col = source.arr_col;

关于google-bigquery - 大查询 : Concatenate two arrays and keep distinct values within MERGE statement，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/64581602/

上一篇：sqlite - flutter sqlite数据库存放位置

下一篇：rust-cargo - cargo 安装 : specify a/tmp substitute?

sql - 能够获取 BigQuery 中每个数组条目的 "index"(或序数值)吗？

pandas - 串联错误(传递的项目数错误x，放置位置表示1)

c - C中的字符串连接

python - 自动注册新的级长流程？

google-bigquery - 具有大量表的bigquery

google-bigquery - Google Cloud SDK 更新导致 bq 出错

sql - mysql查询和全文中的Concat()

etl - 为什么在 Airflow 中使用 aws_athena_hook 时出现 NoRegionError？

mysql - 导入离线患者病历时根据年龄确定出生日期