我有一个由以下人员创建的表:
CREATE TABLE #test_table
(
id INT
,EventName VARCHAR(50)
,HomeTeam VARCHAR(25)
,Metric INT
)
INSERT INTO #test_table VALUES
(1, 'Team A vs Team B', 'Team A', 5),
(2, 'Team A vs Team B', 'Team A', 7),
(3, 'Team C vs Team D', 'Team C', 6),
(4, 'Team Z vs Team A', 'Team Z', 8),
(5, 'Team A vs Team B', 'Team A', 9),
(6, 'Team C vs Team D', 'Team C', 3),
(7, 'Team C vs Team D', 'Team C', 1),
(8, 'Team E vs Team F', 'Team E', 2)
结果是:
id EventName HomeTeam Metric
------------------------------------------
1 Team A vs Team B Team A 5
2 Team A vs Team B Team A 7
3 Team C vs Team D Team C 6
4 Team Z vs Team A Team Z 8
5 Team A vs Team B Team A 9
6 Team C vs Team D Team C 3
7 Team C vs Team D Team C 1
8 Team E vs Team F Team E 2
A 想要计算一个新列 PreviousMetricN
,其中 N 可以是 1、2、3 ...,它显示 Metric
的先前值,但前提是HomeTeam
参与了之前的事件。例如:
id EventName HomeTeam Metric PreviousMetric1 PreviousMetric2
------------------------------------------------------------------------
1 Team A vs Team B Team A 5 NULL NULL
2 Team A vs Team B Team A 7 5 NULL
3 Team C vs Team D Team C 6 NULL NULL
4 Team Z vs Team A Team Z 8 NULL NULL
5 Team A vs Team B Team A 9 8 7
6 Team C vs Team D Team C 3 6 NULL
7 Team C vs Team D Team C 1 3 6
8 Team E vs Team F Team E 2 NULL NULL
我一直在尝试 LAG
的变体,在 PARTITION BY
子句中使用新的分组变量,例如
LAG(Metric) OVER(Partition by (CASE WHEN CHARINDEX(HomeTeam, EventName)>0 THEN 1 ELSE 0 END) ORDER BY id)
但没有成功。如何做到这一点?
编辑: 我也在这里问过 Pandas 这个问题: Pandas shift - get previous value if multiple conditions satisfied
最佳答案
我在这里看不到使用窗口函数和单次扫描表的答案。我们可以按以下方式在单次扫描中执行此查询:
让我们假设您在另一列中有 AwayTeam
。
If you don't have this yet and you wanted to parse it out of
EventData
:
We could use:SUBSTRING(EventData, CHARINDEX(' vs ', EventData) + 4)
I urge you to follow proper normalization and create this as a proper column in your table.
我们的算法是这样运行的:
- 使用
CROSS APPLY
将两个团队相乘(逆轴)作为单独的行
- 使用
LAG
计算之前的Metric
,按合并的Team
列进行分区 - 过滤掉加倍的行,这样我们每个原始行只得到一行
SELECT id, HomeTeam, AwayTeam, Metric, Prev1, Prev2, Prev3
FROM (
SELECT *
,Prev1 = LAG(Metric, 1) OVER (PARTITION BY v.Team ORDER BY id)
,Prev2 = LAG(Metric, 2) OVER (PARTITION BY v.Team ORDER BY id)
,Prev3 = LAG(Metric, 3) OVER (PARTITION BY v.Team ORDER BY id)
-- more of these ......
FROM test_table
CROSS APPLY (VALUES (HomeTeam, 1),(AwayTeam, 0)) AS v(Team,IsHome)
) AS t
WHERE IsHome = 1
-- ORDER BY id --if necessary
重要的是,我们可以在不使用多种不同的排序、分区或排序,并且不使用自连接的情况下做到这一点。只需一次扫描。
结果:
关于SQL - 如果满足使用多个先前列的条件,则滞后获取先前的值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65905861/