oracle - 查找当前月份和上个月值的总和

标签 oracle hive impala

我有一个源表,其中包含每个月的员工帐户详细信息,日期是字符串类型(yyyyMMdd)。尝试查找每个帐户当前月份值和上个月值的总和。

Source data:

+-----------+-------------+-----------+----------+
|  date     | account     | division  |  amount  |
+-----------+-------------+-----------+----------+
| 20190331  | 123         | AB0       | 100      |
+-----------+-------------+-----------+----------+
| 20190331  | 123         | AB1       | 110      |
+-----------+-------------+-----------+----------+
| 20190331  | 123         | AB2       | 120      |
+-----------+-------------+-----------+----------+
| 20190228  | 123         | AB4       | 100      |
+-----------+-------------+-----------+----------+
| 20190228  | 123         | AB1       | 100      |
+-----------+-------------+-----------+----------+
| 20190228  | 123         | AB2       | 100      |
+-----------+-------------+-----------+----------+
| 20190131  | 123         | AB0       | 100      |
+-----------+-------------+-----------+----------+

在 impala 中运行以下查询,但这返回了当前和上个月相同的结果。

select distinct * from (
SELECT 
sum(amount) over (partition BY account, a.date) AS asset_current,
sum(amount) over (partition BY account, from_unixtime(unix_timestamp(to_date(LAST_DAY(ADD_MONTHS(to_timestamp(data_as_of_date,'yyyyMMdd'),-1))),'yyyy-MM-dd'),'yyyyMMdd')) AS asset_previous,
     account,
     date,
FROM employee_assets a
)x ;

预期输出:

+-----------+-------------+--------------------+----------------------+
|  date     | account     | current_month_sum  |  previous_month_sum  |
+-----------+-------------+--------------------+----------------------+
| 20190331  | 123         | 330                | 300                  |
+-----------+-------------+--------------------+----------------------+
| 20190228  | 123         | 300                | 100                  |
+-----------+-------------+--------------------+----------------------+
| 20190131  | 123         | 100                | 0                    |
+-----------+-------------+--------------------+----------------------+
<小时/>

我使用了以下查询,但如果上个月的数据不可用,它会返回 asset_previous 作为上个月的值。

SELECT
    x.*,
    LAG(current_month_sum, 1, 0) OVER(PARTITION BY account ORDER BY adate) previous_month_sum  
FROM (
    SELECT adate, account, SUM(amount) current_month_sum  
    FROM employee_assets
    GROUP BY adate, account
) x
ORDER BY adate DESC

例如:我们没有账户 123 的 20181231 的输入数据,因此 1 月的 asset_prev 应为 0,但查询返回 500(这是 2018 年 11 月的金额) 输入数据:

+-----------+-------------+-----------+----------+
|  date     | account     | division  |  amount  |
+-----------+-------------+-----------+----------+
| 20190331  | 123         | AB0       | 100      |
+-----------+-------------+-----------+----------+
| 20190331  | 123         | AB1       | 110      |
+-----------+-------------+-----------+----------+
| 20190331  | 123         | AB2       | 120      |
+-----------+-------------+-----------+----------+
| 20190228  | 123         | AB4       | 100      |
+-----------+-------------+-----------+----------+
| 20190228  | 123         | AB1       | 100      |
+-----------+-------------+-----------+----------+
| 20190228  | 123         | AB2       | 100      |
+-----------+-------------+-----------+----------+
| 20190131  | 123         | AB0       | 100      |
+-----------+-------------+-----------+----------+
| 20181130  | 123         | ABX       | 500      |
+-----------+-------------+-----------+----------+

查询正在返回:

+-----------+-------------+--------------------+----------------------+
|  date     | account     | current_month_sum  |  previous_month_sum  |
+-----------+-------------+--------------------+----------------------+
| 20190331  | 123         | 330                | 300                  |
+-----------+-------------+--------------------+----------------------+
| 20190228  | 123         | 300                | 100                  |
+-----------+-------------+--------------------+----------------------+
| 20190131  | 123         | 100                | 500                  |
+-----------+-------------+--------------------+----------------------+
| 20191131  | 123         | 500                | 0                    |
+-----------+-------------+--------------------+----------------------+

预期输出:

+-----------+-------------+--------------------+----------------------+
|  date     | account     | current_month_sum  |  previous_month_sum  |
+-----------+-------------+--------------------+----------------------+
| 20190331  | 123         | 330                | 300                  |
+-----------+-------------+--------------------+----------------------+
| 20190228  | 123         | 300                | 100                  |
+-----------+-------------+--------------------+----------------------+
| 20190131  | 123         | 100                | 0                    |
+-----------+-------------+--------------------+----------------------+
| 20191131  | 123         | 500                | 0                    |
+-----------+-------------+--------------------+----------------------+

最佳答案

您可以在内部查询中使用聚合,并在外部查询中使用 LAG() 来获取 account 分区中上个月的值。 LAG() 的三参数形式允许您指定默认值。

SELECT
    x.*,
    LAG(current_month_sum, 1, 0) OVER(PARTITION BY account ORDER BY adate) previous_month_sum  
FROM (
    SELECT adate, account, SUM(amount) current_month_sum  
    FROM employee_assets
    GROUP BY adate, account
) x
ORDER BY adate DESC

注意:date 对于列名来说不是一个好的选择,因为它可能与保留字冲突。我在查询中将该列重命名为 date

关于oracle - 查找当前月份和上个月值的总和,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/58276899/

相关文章:

sql - 使用 XMLELEMENT Oracle 时如何替换 ' 或任何特殊字符

java - 从 List<Object[]> 构建树关系图的有效方法

hadoop - 如何在 hive 中使用变量?

impala - impala 的 view 或 with 子句是否只计算一次并在查询中多次使用?

SQL,因帕拉 : why can't I do two counts on one query

sql - oracle游标值根据条件而变化

sql - Oracle SQL 使用空字符串更新 NOT NULL 列

java - 将配置单元外部表更改为同一数据库中的内部表也会删除另一个表中的数据

hive - HIVE 中的减号查询

hadoop - Hive和Impala之间的时间戳