其实我了解如何PERCENTILE_CONT有效,但通过手动计算百分位数和调用函数,我得到了不同的结果。
这是数据集:
305.7884804
350
373.3728865
384.2094838
410.8603441
414.9842786
455.3545205
550
为了计算 25%,我总结了
350
和 373.3728865
然后除以2。结果是361.68644325
.为了计算 50%,我总结了
384.2094838
和 410.8603441
然后除以2。结果是397.53491395
.为了计算 75%,我总结了
414.9842786
和 455.3545205
然后除以2。结果是435.16939955
.但是,当我运行此 sql 时:
select
percentile_cont(0.25) within group(order by YEAR_2_FTE ASC),
percentile_cont(0.5) within group(order by YEAR_2_FTE ASC),
percentile_cont(0.75) within group(order by YEAR_2_FTE ASC) from sr_database
where firm_id=999;
我得到这样的结果:
25%: 367.529664875
50%: 397.53491395
75%: 425.076839075
我错过了什么?
最佳答案
您正确地取了第 4 个和第 5 个值之间的中间点来计算第 50 个百分位数。但是对于第 25 个百分位数(对于第 75 个百分位数),取第 2 和第 3 之间的中间点是不正确的。最好通过插图来说明这一点:
这显示了在它们之间绘制的 8 个点和 4 条线,以形成 4 个相同大小的线段。第 25 个和第 75 个百分位标记分别接近第 3 个和第 6 个值。
Oracle documentation描述了这些是如何计算的:
The result of PERCENTILE_CONT is computed by linear interpolation between values after ordering them. Using the percentile value (P) and the number of rows (N) in the aggregation group, we compute the row number we are interested in after ordering the rows with respect to the sort specification. This row number (RN) is computed according to the formula RN = (1+ (P*(N-1)). The final result of the aggregate function is computed by linear interpolation between the values from rows at row numbers CRN = CEILING(RN) and FRN = FLOOR(RN).
在您的示例中,对于第 25 个百分位数
P = 0.25
和 N = 8
所以:RN = (1+ 0.25*(8-1))
= 2.75
所以
CRN = 3
和 FRN = 2
文档继续说:
The final result will be:
If (CRN = FRN = RN) then the result is
(value of expression from row at RN)
Otherwise the result is
(CRN - RN) * (value of expression for row at FRN) + (RN - FRN) * (value of expression for row at CRN)
插入上面的值,“否则...”部分适用于计算,即:
Result = (3 - 2.75) * 350 + (2.75 - 2) * 373.3728856
= 367.5296642
同样对于第 75 个百分位数:
RN = (1+ 0.75*(8-1))
= 6.25
CRN = 7
FRN = 6
Result = (7 - 6.25) * 414.9842786 + (6.25 - 6) * 455.3545205
= 425.076839075
关于SQL。调用 PERCENTILE_CONT 函数的意外结果,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/37658923/