我有一个包含数据的表格,
ID SEQ EFFDAT
------- --------- -----------------------
1024 1 01/07/2010 12:00:00 AM
1024 3 18/04/2017 12:00:00 AM
1024 2 01/08/2017 12:00:00 AM
当我执行以下查询时,我得到了错误的最大序列,但我仍然得到了正确的最大生效日期。
询问:
SELECT
max(seq) over (partition by id order by EFFDAT desc) maxEffSeq,
partitionByTest.*,
max(EFFDAT) over (partition by (id) order by EFFDAT desc ) maxeffdat
FROM partitionByTest;
输出:
MAXEFFSEQ ID SEQ EFFDAT MAXEFFDAT
---------- ---------- ---------- ------------------------ ------------------------
2 1024 2 01/08/2017 12:00:00 AM 01/08/2017 12:00:00 AM
3 1024 3 18/04/2017 12:00:00 AM 01/08/2017 12:00:00 AM
3 1024 1 01/07/2010 12:00:00 AM 01/08/2017 12:00:00 AM
如果我在查询中删除订单,我会得到正确的输出。
询问:
SELECT max(seq) over (partition by id ) maxEffSeq, partitionByTest.*,
max(EFFDAT) over (partition by (id) order by EFFDAT desc ) maxeffdat
FROM partitionByTest;
输出:
MAXEFFSEQ ID SEQ EFFDAT MAXEFFDAT
---------- ---------- ---------- ------------------------ ------------------------
3 1024 2 01/08/2017 12:00:00 AM 01/08/2017 12:00:00 AM
3 1024 3 18/04/2017 12:00:00 AM 01/08/2017 12:00:00 AM
3 1024 1 01/07/2010 12:00:00 AM 01/08/2017 12:00:00 AM
我知道当我们使用 MAX 函数时,不需要使用 order by 子句。但是我很想知道 order by 在按功能分区中是如何工作的,以及为什么当我使用 order by 子句时它会给出错误的序列结果和正确的日期结果?
最佳答案
添加 order by
还暗示了一个窗口子句,并且由于您没有指定一个窗口子句,因此您将获得默认值,所以您实际上是在做:
max(seq) over (
partition by id
order by EFFDAT desc
range between unbounded preceding and current row
)
如果您考虑以相同的方式对数据进行排序时数据的外观,按降序排列:
select partitionbytest.*,
count(*) over (partition by id order by effdat desc) range_rows,
max(seq) over (partition by id order by effdat desc) range_max_seq,
count(*) over (partition by id) id_rows,
max(seq) over (partition by id) id_max_seq
from partitionbytest
order by effdat desc;
ID SEQ EFFDAT RANGE_ROWS RANGE_MAX_SEQ ID_ROWS ID_MAX_SEQ
---------- ---------- ---------- ---------- ------------- ---------- ----------
1024 2 2017-08-01 1 2 3 3
1024 3 2017-04-18 2 3 3 3
1024 1 2010-07-01 3 3 3 3
然后它变得更加清晰。我已经包含了等效的分析计数,因此您还可以查看正在考虑的行数,有和没有
order by
条款。没有
order by
子句它总是考虑该 ID 的所有值,因此它认为 3 是所有值的最高值。见 the documentation for analytic functions有关如何确定的更多详细信息,特别是:
The group of rows is called a window and is defined by the analytic_clause. For each row, a sliding window of rows is defined. The window determines the range of rows used to perform the calculations for the current row. Window sizes can be based on either a physical number of rows or a logical interval such as time.
和
You cannot specify [windowing_clause] unless you have specified the order_by_clause.
和
If you omit the windowing_clause entirely, then the default is
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
.
关于sql - ORDER BY在PARTITION BY函数中的作用是什么?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/51761481/