postgresql - Postgres - 计算累积数据的变化

标签 postgresql

我正在通过 Python 从一些 API 源收集数据,并将其添加到 Postgres 中的 2 个表中。

然后我使用这些数据制作报告,加入和分组/过滤数据。每天我都会添加数千行。

成本、收入和销售额始终是累积的,这意味着每个数据点都来自该产品的 t1,t2 是数据检索的时间。

因此,最新的数据拉取将包括所有之前的数据,一直到 t1。 t1, t2 是 Postgres 中没有时区的时间戳。我目前使用 Postgres 10。

示例:

id, vendor_id, product_id, t1, t2, cost, revenue, sales
1, a, a, 2018-01-01, 2018-04-18, 50, 200, 34
2, a, b, 2018-05-01, 2018-04-18, 10, 100, 10
3, a, c, 2018-01-02, 2018-04-18, 12, 100, 9
4, a, d, 2018-01-03, 2018-04-18, 12, 100, 8
5, b, e, 2018-25-02, 2018-04-18, 12, 100, 7

6, a, a, 2018-01-01, 2018-04-17, 40, 200, 30
7, a, b, 2018-05-01, 2018-04-17, 0, 95, 8
8, a, c, 2018-01-02, 2018-04-17, 10, 12, 5
9, a, d, 2018-01-03, 2018-04-17, 8, 90, 4
10, b, e, 2018-25-02, 2018-04-17, 9, 0-, 3

成本和收入来自两个表,我在 vendor_id、product_id 和 t2 上加入它们。

有没有一种方法可以让我遍历所有数据并将其“移动”并减去,这样我就可以获得基于时间序列的数据,而不是累积数据?

这应该在存储之前完成,还是在生成报告时更好?

作为引用,目前,如果我想要一个在两次之间发生变化的报告,我会执行两个子查询,但与按时间序列排列数据并仅聚合所需间隔相比,这似乎是倒退。

with report1 as (select ...),
report2 as (select ...)
select .. from report1 left outer join report2 on ...

提前致谢!

小红书

最佳答案

您可以使用 LAG():

Window Functions :

...returns value evaluated at the row that is offset rows before the current row within the partition; if there is no such row, instead return default (which must be of the same type as value). Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to null.

with sample_data as (
        select 1 as id, 'a'::text vendor_id, 'a'::text product_id, '2018-01-01'::date as t1, '2018-04-18'::date as t2, 50 as cost, 200 as revenue, 36 as sales
        union all
        select 2 as id, 'a'::text vendor_id, 'b'::text product_id, '2018-01-01'::date as t1, '2018-04-18'::date as t2, 55 as cost, 200 as revenue, 34 as sales
        union all
        select 3 as id, 'a'::text vendor_id, 'a'::text product_id, '2018-01-01'::date as t1, '2018-04-17'::date as t2, 35 as cost, 150 as revenue, 25 as sales
        union all
        select 4 as id, 'a'::text vendor_id, 'b'::text product_id, '2018-01-01'::date as t1, '2018-04-17'::date as t2, 25 as cost, 140 as revenue, 23 as sales
        union all
        select 5 as id, 'a'::text vendor_id, 'a'::text product_id, '2018-01-01'::date as t1, '2018-04-16'::date as t2, 16 as cost, 70 as revenue, 12 as sales
        union all
        select 6 as id, 'a'::text vendor_id, 'b'::text product_id, '2018-01-01'::date as t1, '2018-04-16'::date as t2, 13 as cost, 65 as revenue, 11 as sales
)
select sd.*
    , coalesce(cost - lag(cost) over (partition by vendor_id, product_id order by t2),cost) cost_new
    , coalesce(revenue - lag(revenue) over (partition by vendor_id, product_id order by t2),revenue) revenue_new
    , coalesce(sales - lag(sales) over (partition by vendor_id, product_id order by t2),sales) sales_new
from sample_data sd
order by t2 desc

关于postgresql - Postgres - 计算累积数据的变化,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/49898121/

相关文章:

postgresql - 跨三个表的多次插入捕获 postgres 中的序列 ID 列

postgresql - pgAdmin 4 不使用几何创建函数

sql - 在 90 天的窗口中,我有多少不同的活跃用户?

PHP变量和查询

C# .NET + PostgreSQL

postgresql - 将 CSV 文件导入 PostgreSQL 时忽略引号?

python - 为数百万数据运行选择查询的有效方法

postgresql - PostgreSQL COPY 命令中动态生成的表名

sql-server - Postgresql存储过程返回select结果集

postgresql - 在 Postgres 中使用多个分区的开销