postgresql - Postgres - 计算累积数据的变化

我正在通过 Python 从一些 API 源收集数据，并将其添加到 Postgres 中的 2 个表中。

然后我使用这些数据制作报告，加入和分组/过滤数据。每天我都会添加数千行。

成本、收入和销售额始终是累积的，这意味着每个数据点都来自该产品的 t1，t2 是数据检索的时间。

因此，最新的数据拉取将包括所有之前的数据，一直到 t1。 t1, t2 是 Postgres 中没有时区的时间戳。我目前使用 Postgres 10。

示例:

id, vendor_id, product_id, t1, t2, cost, revenue, sales
1, a, a, 2018-01-01, 2018-04-18, 50, 200, 34
2, a, b, 2018-05-01, 2018-04-18, 10, 100, 10
3, a, c, 2018-01-02, 2018-04-18, 12, 100, 9
4, a, d, 2018-01-03, 2018-04-18, 12, 100, 8
5, b, e, 2018-25-02, 2018-04-18, 12, 100, 7

6, a, a, 2018-01-01, 2018-04-17, 40, 200, 30
7, a, b, 2018-05-01, 2018-04-17, 0, 95, 8
8, a, c, 2018-01-02, 2018-04-17, 10, 12, 5
9, a, d, 2018-01-03, 2018-04-17, 8, 90, 4
10, b, e, 2018-25-02, 2018-04-17, 9, 0-, 3

成本和收入来自两个表，我在 vendor_id、product_id 和 t2 上加入它们。

有没有一种方法可以让我遍历所有数据并将其“移动”并减去，这样我就可以获得基于时间序列的数据，而不是累积数据？

这应该在存储之前完成，还是在生成报告时更好？

作为引用，目前，如果我想要一个在两次之间发生变化的报告，我会执行两个子查询，但与按时间序列排列数据并仅聚合所需间隔相比，这似乎是倒退。

with report1 as (select ...),
report2 as (select ...)
select .. from report1 left outer join report2 on ...

提前致谢!

小红书

最佳答案

您可以使用 LAG():

Window Functions :

...returns value evaluated at the row that is offset rows before the current row within the partition; if there is no such row, instead return default (which must be of the same type as value). Both offset and default are evaluated with respect to the current row. If omitted, offset defaults to 1 and default to null.

with sample_data as (
        select 1 as id, 'a'::text vendor_id, 'a'::text product_id, '2018-01-01'::date as t1, '2018-04-18'::date as t2, 50 as cost, 200 as revenue, 36 as sales
        union all
        select 2 as id, 'a'::text vendor_id, 'b'::text product_id, '2018-01-01'::date as t1, '2018-04-18'::date as t2, 55 as cost, 200 as revenue, 34 as sales
        union all
        select 3 as id, 'a'::text vendor_id, 'a'::text product_id, '2018-01-01'::date as t1, '2018-04-17'::date as t2, 35 as cost, 150 as revenue, 25 as sales
        union all
        select 4 as id, 'a'::text vendor_id, 'b'::text product_id, '2018-01-01'::date as t1, '2018-04-17'::date as t2, 25 as cost, 140 as revenue, 23 as sales
        union all
        select 5 as id, 'a'::text vendor_id, 'a'::text product_id, '2018-01-01'::date as t1, '2018-04-16'::date as t2, 16 as cost, 70 as revenue, 12 as sales
        union all
        select 6 as id, 'a'::text vendor_id, 'b'::text product_id, '2018-01-01'::date as t1, '2018-04-16'::date as t2, 13 as cost, 65 as revenue, 11 as sales
)
select sd.*
    , coalesce(cost - lag(cost) over (partition by vendor_id, product_id order by t2),cost) cost_new
    , coalesce(revenue - lag(revenue) over (partition by vendor_id, product_id order by t2),revenue) revenue_new
    , coalesce(sales - lag(sales) over (partition by vendor_id, product_id order by t2),sales) sales_new
from sample_data sd
order by t2 desc

关于postgresql - Postgres - 计算累积数据的变化，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/49898121/

postgresql - Postgres - 计算累积数据的变化

上一篇：sql - 如何使用多个 LIKE 运算符并使用索引

下一篇：sql - PostgreSQL 多列唯一性不起作用