python - Pandas diff() 函数的 SQL 模拟(第一个离散差值)[LAG 函数]

标签 python sql oracle pandas difference

我正在寻找一种方法来编写 SQL 查询,将第一个离散差异应用于原始系列。通过使用 Pandas 的 .diff() 在 Python 中这非常简单。方法:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(0,100,size=(10, 2)), columns=list('AB'))

df["diff_A"]=df["A"].diff()
df["diff_B"]=df["B"].diff()

print(df)

我想要的输出在“diff_A”“diff_B”列中演示:

    A   B  diff_A  diff_B
0  36  14     NaN     NaN
1  32  13    -4.0    -1.0
2  31  87    -1.0    74.0
3  58  88    27.0     1.0
4  44  34   -14.0   -54.0
5   2  43   -42.0     9.0
6  15  94    13.0    51.0
7  46  74    31.0   -20.0
8  60   9    14.0   -65.0
9  43  57   -17.0    48.0

我使用 Oracle,但我绝对更喜欢干净的 ANSI 解决方案。

最佳答案

IIUC你可以使用解析LAG功能:

with v as (
  select rowid as rn, a, b from tab
)
select
  a, b,
  a - lag(a, 1) over(order by rn) as diff_a,
  b - lag(b, 1) over(order by rn) as diff_b
from v
order by rn;

PS 使用真实列(如日期)进行排序会更好,因为 rowid can be changed

例如:

select
  a, b,
  a - lag(a, 1) over(order by inserted) as diff_a,
  b - lag(b, 1) over(order by inserted) as diff_b
from tab;

@MatBailie has posted a very good explanation :

Data-sets in SQL are un-ordered. For deterministic results in LAG() always use a sufficient ORDER BY clause. (If no such field exists, one should be created when/before the data in inserted in to a SQL data set. The un-ordered nature of a SQL data set allows massive numbers of scalability options and optimisation options to be available.)

SQL Fiddle test

PS Windowing functions were added to the ANSI/ISO Standard SQL:2003 and then extended in ANSI/ISO Standard SQL:2008. Microsoft was late to this game. DB2, Oracle, Sybase, PostgreSQL and other products have had full implementations for years. SQL Server did not catch up until SQL 2012.

关于python - Pandas diff() 函数的 SQL 模拟(第一个离散差值)[LAG 函数],我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48164751/

相关文章:

python - MultiLabelBinarizer 不适用于具有多个数组的列

php - 记录学生出勤情况的表格设计什么好

sql - 如何从 Oracle 表中获取几乎匹配的字符串?

sql - 多个小删除

oracle - Emacs:如何让 M-x sql-oracle 打开新缓冲区

sql - 递归sql子集查询,使用connect by

python - 使用列表值作为后续键访问字典值

python - MapReduce 如何允许 Mapper 读取 xml 文件进行查找

python - 在像素阵列中查找连通分量

php - 如何从 MYSQL 中的 COUNT(*) ORDER BY 语句打印数组的内容