hadoop - 无法在配置单元查询中的 case 语句中聚合

标签 hadoop join hive case

我有如下数据:

SELECT 
    mtrans.merch_num,
    mtrans.card_num 
FROM a_sbp_db.merch_trans_daily mtrans 
INNER JOIN a_sbp_db.product_holding ph ON mtrans.card_num = ph.acc_num 
INNER JOIN a_sbp_db.cust_demo cdemo ON cdemo.cust_id = ph.cust_id
WHERE mtrans.transaction_date LIKE '2017-09%' AND person_org_code='P' AND ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) < 30;



+-----------+----------------------------+
| merch_num | card_num                   |
+-----------+----------------------------+
|         1 | 4658XXXXXXXXXXXXXXXXXXURMX |
|         2 | 4658XXXXXXXXXXXXXXXXXXIE6X |
|         2 | 4658XXXXXXXXXXXXXXXXXXDA8X |
|         2 | 4658XXXXXXXXXXXXXXXXXX7D1X |
|         2 | 4658XXXXXXXXXXXXXXXXXXTJ2X |
|         2 | 4658XXXXXXXXXXXXXXXXXXQQWX |
|         2 | 4659XXXXXXXXXXXXXXXXXXY4EX |
|         2 | 4658XXXXXXXXXXXXXXXXXXRDOX |
|         2 | 4658XXXXXXXXXXXXXXXXXX0O3X |
|         2 | 4658XXXXXXXXXXXXXXXXXXNVBX |
+-----------+----------------------------+

仅当我获得的唯一 card_num 大于 1 时,我才想按 merch_num 汇总 trans_amt。

在简单的查询中我可以做到:

SELECT 
    mtrans.merch_num,
FROM_UNIXTIME(UNIX_TIMESTAMP(),'MMM-yyyy') AS process_month,
SUM(mtrans.trans_amt) AS total_age_less_30_1 
FROM a_sbp_db.merch_trans_daily mtrans 
INNER JOIN a_sbp_db.product_holding ph ON mtrans.card_num = ph.acc_num 
INNER JOIN a_sbp_db.cust_demo cdemo ON cdemo.cust_id = ph.cust_id
WHERE mtrans.transaction_date LIKE '2017-09%' AND person_org_code='P' AND  ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) < 30 
GROUP BY 
    mtrans.merch_num having count(distinct mtrans.card_num) > 1;

+-----------+---------------+---------------------+
| merch_num | process_month | total_age_less_30_1 |
+-----------+---------------+---------------------+
|         2 | Nov-2017      | 2147.5              |
+-----------+---------------+---------------------+

在这里我可以跳过商户 - 5493036,因为它没有超过 1 张的唯一卡片。

但是我有多个条件 where & 只想写 1 个查询。 使用 case 语句我可以像下面那样做:

SELECT mtrans.merch_num,
    FROM_UNIXTIME(UNIX_TIMESTAMP(),'MMM-yyyy') AS process_month,
    NVL(SUM(CASE
        WHEN (ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) < 30)
            THEN mtrans.trans_amt ELSE 0 END), NULL)
            AS total_age_less_30_1,
    NVL(SUM(CASE
        WHEN (ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) >= 30
                    AND ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) < 40)
            THEN mtrans.trans_amt ELSE 0 END), NULL)
            AS total_age_30_40_1
FROM a_sbp_db.merch_trans_daily mtrans
INNER JOIN a_sbp_db.product_holding ph ON mtrans.card_num = ph.acc_num
INNER JOIN a_sbp_db.cust_demo cdemo ON cdemo.cust_id = ph.cust_id   
WHERE mtrans.transaction_date LIKE '2017-09%'
    AND person_org_code='P'
GROUP BY
    mtrans.merch_num

+-----------+---------------+---------------------+-------------------+
| merch_num | process_month | total_age_less_30_1 | total_age_30_40_1 |
+-----------+---------------+---------------------+-------------------+
|       3   | Nov-2017      | 0                   | 0                 |
|       4   | Nov-2017      | 0                   | 0                 |
|       1   | Nov-2017      | 2.49                | 203.68            |
|       2   | Nov-2017      | 2147.5              | 4907              |
|       5   | Nov-2017      | 0                   | 0                 |
+-----------+---------------+---------------------+-------------------+

我想将 2.49 设为 NULL,因为该商户不存在超过 1 张独特的卡。

我无法申请有条件检查唯一卡号是否大于 1 那么我只需要显示总和(trans_amt)

当我在 case 语句中应用和条件时,出现以下错误:

SELECT 
    mtrans.merch_num,
    FROM_UNIXTIME(UNIX_TIMESTAMP(),'MMM-yyyy') AS process_month,
    NVL(SUM(CASE
        WHEN (ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) < 30 and count(distinct mtrans.card_num) > 1) 
            THEN mtrans.trans_amt ELSE 0 END), NULL)
            AS total_age_less_30_1,
    NVL(SUM(CASE
        WHEN (ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) >= 30
                    AND     ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) < 40 and count(distinct mtrans.card_num) > 1)
            THEN mtrans.trans_amt ELSE 0 END), NULL)
            AS total_age_30_40_1                
FROM a_sbp_db.merch_trans_daily mtrans 
INNER JOIN a_sbp_db.product_holding ph ON mtrans.card_num = ph.acc_num 
INNER JOIN a_sbp_db.cust_demo cdemo ON cdemo.cust_id = ph.cust_id
WHERE mtrans.transaction_date LIKE '2017-09%' 
    AND person_org_code='P' 
GROUP BY 
    mtrans.merch_num;


ERROR: AnalysisException: aggregate function must not contain aggregate parameters: sum(CASE WHEN (round(datediff(mtrans.transaction_date, cdemo.date_birth) / 365) < 30 AND count(DISTINCT mtrans.card_num) > 1) THEN mtrans.trans_amt ELSE 0 END)

有人可以帮忙吗?

最佳答案

错误似乎是因为您在 SUM 语句中有计数。这是你必须尝试的,让我知道它是怎么回事:

SELECT 
    mtrans.merch_num,
    FROM_UNIXTIME(UNIX_TIMESTAMP(),'MMM-yyyy') AS process_month,
    NVL(CASE
        WHEN (ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) < 30 and count(distinct mtrans.card_num) > 1) 
            THEN SUM(mtrans.trans_amt) ELSE 0 END, NULL)
            AS total_age_less_30_1,
    NVL(CASE
        WHEN (ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) >= 30
                    AND     ROUND(DATEDIFF(mtrans.transaction_date,cdemo.date_birth)/365) < 40 and count(distinct mtrans.card_num) > 1)
            THEN SUM(mtrans.trans_amt) ELSE 0 END, NULL)
            AS total_age_30_40_1                
FROM a_sbp_db.merch_trans_daily mtrans 
INNER JOIN a_sbp_db.product_holding ph ON mtrans.card_num = ph.acc_num 
INNER JOIN a_sbp_db.cust_demo cdemo ON cdemo.cust_id = ph.cust_id
WHERE mtrans.transaction_date LIKE '2017-09%' 
    AND person_org_code='P' 
GROUP BY 
    mtrans.merch_num;

关于hadoop - 无法在配置单元查询中的 case 语句中聚合,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/47207033/

相关文章:

php - 对 ZenCart 的地址簿条目执行加入

Hive 分区恢复

sql - Hive View 没有路径?

python - 对于这种情况,Python 和 Hadoop 的选择是否合适?

amazon-web-services - 使用 Scala 将 S3 中的数据读入 Spark 数据帧

java - 将 oneToMany 与可连接但一起使用,将其映射到子实体

windows - 通过 Cygwin : Could Not Locate null\bin\winutils. exe 在 Windows 上出错 Hadoop

mysql - 通过一个表的重复项从两个表中选择多行

arrays - 在 Hive 中搜索数组

Hadoop hive : How to allow regular user continuously write data and create tables in warehouse directory?