sql - 汇总BigQuery中的重复字段

标签 sql google-bigquery

我将尝试尽可能清楚地解释我的问题,如果不是,请告诉我。

我有一个表[MyTable]看起来像这样:

----------------------------------------
|chn:integer | auds:integer (repeated) |
----------------------------------------
|1           |3916                     |
|1           |4983                     |
|1           |6233                     |
|1           |1214                     |
|2           |1200                     |
|2           |900                      |
|2           |2030                     |
|2           |2345                     |
----------------------------------------
Auds始终重复4次。

如果查询SELECT chn, auds FROM [MyTable] WHERE chn = 1,则会得到以下结果:
-------------------
|Row | chn | auds |
-------------------
|1   |1    |3916  |
|2   |1    |4983  |  
|3   |1    |6233  |  
|4   |1    |1214  |
-------------------

如果查询SELECT chn, auds FROM [MyTable] WHERE (chn = 1 OR chn = 2),则会得到以下结果:
-------------------
|Row | chn | auds |
-------------------
|1   |1    |1200  |
|2   |1    |900   |  
|3   |1    |2030  |  
|4   |2    |2345  |
-------------------

从逻辑上讲,我得到的结果是原来的两倍,但是我想要得到的是SUM()auds的重复字段chn = 1chn = 2,或者在视觉上,类似这样的结果:
-------------------
|Row | chn | auds |
-------------------
|1   |3    |5116  |
|2   |3    |5883  |  
|3   |3    |8263  |  
|4   |3    |3559  |
-------------------

我试图做些事情:
SELECT a1+a2 FROM

(SELECT auds AS a1 FROM [MyTable] WHERE chn = 1),
(SELECT auds AS a2 FROM [MyTable] WHERE chn = 2)

但是我收到以下错误:
Error: Cannot query the cross product of repeated fields a1 and a2.

最佳答案

standard SQL表示这种逻辑要容易得多(取消选中“显示选项”下的“使用旧版SQL”)。这是一个计算auds数组总和的示例:

WITH MyTable AS (
  SELECT
    1 AS chn,
    [2, 3, 4, 5, 6] AS auds
  UNION ALL SELECT
    2 AS chn,
    [7, 8, 9, 10, 11] AS auds
)
SELECT
  chn,
  (SELECT SUM(aud) FROM UNNEST(auds) AS aud) AS auds_sum
FROM MyTable;
+-----+----------+
| chn | auds_sum |
+-----+----------+
|   1 |       20 |
|   2 |       45 |
+-----+----------+

另一个计算chn = 1chn = 2的成对和(我根据您的问题认为这是您想要的):
WITH MyTable AS (
  SELECT
    1 AS chn,
    [2, 3, 4, 5, 6] AS auds
  UNION ALL SELECT
    2 AS chn,
    [7, 8, 9, 10, 11] AS auds
)
SELECT
  ARRAY(SELECT first_aud + second_auds[OFFSET(off)]
        FROM UNNEST(first_auds) AS first_aud WITH OFFSET off)
    AS summed_auds
FROM (
  SELECT
    (SELECT auds FROM MyTable WHERE chn = 1) AS first_auds,
    (SELECT auds FROM MyTable WHERE chn = 2) AS second_auds
);
+---------------------+
|    summed_auds      |
+---------------------+
| [9, 11, 13, 15, 17] |
+---------------------+

编辑:另一个示例,该示例求和所有行中对应的数组元素。这可能不是特别有效,但是它应该产生预期的结果:
WITH MyTable AS (
  SELECT
    1 AS chn,
    [2, 3, 4, 5, 6] AS auds
  UNION ALL SELECT
    2 AS chn,
    [7, 8, 9, 10, 11] AS auds
  UNION ALL SELECT
    3 AS chn,
    [-1, -6, 2, 3, 2] AS auds
)
SELECT
  ARRAY(SELECT
          (SELECT SUM(auds[OFFSET(off)]) FROM UNNEST(all_auds))
        FROM UNNEST(all_auds[OFFSET(0)].auds) WITH OFFSET off)
    AS summed_auds
FROM (
  SELECT
    ARRAY_AGG(STRUCT(auds)) AS all_auds
  FROM MyTable
);
+--------------------+
|    summed_auds     |
+--------------------+
| [8, 5, 15, 18, 19] |
+--------------------+

关于sql - 汇总BigQuery中的重复字段,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/38978805/

相关文章:

google-cloud-platform - GCP - 有没有办法在实例级别获取帐单行项目

google-bigquery - 从 Bigquery 查询 Bigtable 前缀

sql - Google BigQuery,使用 'unnest' 函数时丢失了空行

python - 如何将 Csv 文件流式传输到 BigQuery?

google-bigquery - 将文件名添加为导入 BigQuery 的列?

Mysql sum 查询在使用多个连接时返回不正确的结果

sql - PostgreSQL 在 json 中插入动态值

sql - 将多个 Excel 文件导入 SQL Server

mysql - SQL查询仅返回1个结果

MySQL 说 "Table ' 数据库.表'不存在”