python - Pandas Rolling vs Scipy kurtosis - 严重的数值不准确

标签 python pandas

首先,对于我在下面列出的明显不简单的示例,我深表歉意。我完全知道这不符合 SO 的最小可重现性约束,但是,现在已经进行了数小时的实验,试图重现该问题,在我看来,它确实只在对至少数百个值执行计算时才会出现。

我有一个包含数百万个值的数据框,我想在其中滚动计算每一列的峰度。最初我使用 pd.rolling.kurt:

df.rolling(20, min_periods=3).kurt(bias=False)

但注意到该方法存在两个严重问题:

  1. 准确性不令人满意;尽管 pandas 的方法给出了大致还可以的结果,但对于我的用例来说,1e-4 数量级的偏差很难接受;
  2. 更令人担忧的是经常“爆炸”的峰度值:峰度值突然开始偏离 +/-10,000,完全扭曲了预期的输出,没有明显的原因。

我创建了三个系列,s1s2s3,分别具有 300、600 和 900 个值。 (在这篇文章的末尾添加了具有确切值的赋值,以免在我的文章之后造成太多麻烦。)这三个系列是数据框一列的切片。切片以最后位置固定的方式创建,即 s1 的值从 N-299N s2N-599Ns3N-899N 。在这三个系列上运行 pd.rolling.kurt 并打印数据帧的尾部(出现我要讨论的问题的地方)给出以下结果:

>>> s1.rolling(20,min_periods=3).kurt().tail(10)
290     9.591067
291     9.591067
292     9.591067
293     9.591067
294    19.663666
295    14.872262
296    14.147157
297    16.716964
298     7.032522
299    19.983796
>>> s2.rolling(20,min_periods=3).kurt().tail(10)
590     9.591067
591     9.591067
592     9.591067
593     9.591067
594    19.663666
595    14.872262
596    14.147157
597    16.716964
598     7.032522
599    19.983796
>>> s3.rolling(20,min_periods=3).kurt().tail(10)
890         9.591071
891         9.591071
892         9.591071
893         9.591071
894        19.663685
895        15.248361
896        40.444894
897      1368.233241
898    251407.375343
899    902540.031652

我在 Excel 中执行了相同的计算,对于最后十个指数,峰度值应该如下(我使用符号 290/590/890 来节省一些空间:三个输出系列索引值 290-299、590-599 和 890-899 具有相同的值):

290 / 590 / 890      9.591067361
291 / 591 / 891      9.591067361
292 / 592 / 892      9.591067361
293 / 593 / 893      9.591067361
294 / 594 / 894      19.66366573
295 / 595 / 895      14.87226197
296 / 596 / 896      14.14715754
297 / 597 / 897      16.7169886
298 / 598 / 898      7.037037037
299 / 599 / 899      20

观察 pd.rolling.kurt 提供的输出,我们看到前两个输出是相同的,尽管它们与我使用 Excel 计算的实际输出不匹配。然而,更大的问题发生在第三个输出中,其中值爆炸,就好像系列中值的总数会以某种方式影响峰度值,即使对于所有三种情况我都使用了 20 的滚动窗口和最小所需数量3. 从理论上讲,如果我的理解是正确的,这意味着除了当前行和最后 19 行之外,没有其他东西会干扰峰度输出。我很困惑这些“爆炸性”值是如何出现的。

然后我使用 scipy.stats.kurtosis 重新计算了同一系列的峰度值。这给了我以下输出:

>>> s1.rolling(20,min_periods=3).apply(lambda x: kurtosis(x, bias=False)).tail(10)
290     9.591067
291     9.591067
292     9.591067
293     9.591067
294    19.663666
295    14.872262
296    14.147158
297    16.716989
298     7.037037
299    20.000000
>>> s2.rolling(20,min_periods=3).apply(lambda x: kurtosis(x, bias=False)).tail(10)
590     9.591067
591     9.591067
592     9.591067
593     9.591067
594    19.663666
595    14.872262
596    14.147158
597    16.716989
598     7.037037
599    20.000000
>>> s3.rolling(20,min_periods=3).apply(lambda x: kurtosis(x, bias=False)).tail(10)
890     9.591067
891     9.591067
892     9.591067
893     9.591067
894    19.663666
895    14.872262
896    14.147158
897    16.716989
898     7.037037
899    20.000000

这完美地计算了峰度。然而,.apply(lambda x: kurtosis(x,...) 构造与矢量化 pandas 方法相比效率低得惊人,将整个数据帧的总处理时间从几分钟推到所有一个多小时!我完全意识到,在许多情况下,内置矢量化解决方案往往更喜欢速度而不是数值精度,这可以解释我上面列出的第一个问题;但是,至于第二个问题(即“爆炸”值) 我根本看不出有什么理由。

有没有什么方法可以有效地计算峰度,而不会出现值发散和使我的整个输出无效的情况?


系列定义

这是我用来计算上述输出的确切值:

s1 = pd.Series([0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0001499887511247459,-7.499156348433101e-05,-3.699790962233055e-05,-1.899945851585629e-05,-8.999869502079515e-06,-4.999962500264377e-06,-1.999992000039351e-06,-9.999974999814318e-07,-9.999984999603102e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.699983850190338e-05,-8.999878501628346e-06,-3.999972000122605e-06,-1.999992000039351e-06,-9.999974999814318e-07,0.0003669319382432873,-0.0001849488621671012,-9.198730581664589e-05,-4.499687272496313e-05,0.0009075453820856781,0.0004854184782060238,-0.000720221831477389,-0.000359805708801156,-0.0001799514136040646,-8.998785170075082e-05,-5.999640023402946e-05,-1.9999600008734e-05,-6.999954500263924e-06,-1.999995999958864e-06,-9.999994999391884e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.001201278176363365,-0.0008013581550363867,-0.0002669288650428971,-8.89921242557729e-05,-2.899914452727788e-05,-9.99990000099588e-06,-2.999989500049026e-06,-9.999984999603102e-07,-9.999994999391884e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0005218638053935734,-0.0004638654873286288,-3.799851806232993e-05,-1.299982450270071e-05,-4.999977500118572e-06,-9.999984999603102e-07,-9.999994999391884e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0])

s2 = pd.Series([0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0001499887511247459,-7.499156348433101e-05,-3.699790962233055e-05,-1.899945851585629e-05,-8.999869502079515e-06,-4.999962500264377e-06,-1.999992000039351e-06,-9.999974999814318e-07,-9.999984999603102e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.699983850190338e-05,-8.999878501628346e-06,-3.999972000122605e-06,-1.999992000039351e-06,-9.999974999814318e-07,0.0003669319382432873,-0.0001849488621671012,-9.198730581664589e-05,-4.499687272496313e-05,0.0009075453820856781,0.0004854184782060238,-0.000720221831477389,-0.000359805708801156,-0.0001799514136040646,-8.998785170075082e-05,-5.999640023402946e-05,-1.9999600008734e-05,-6.999954500263924e-06,-1.999995999958864e-06,-9.999994999391884e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.001201278176363365,-0.0008013581550363867,-0.0002669288650428971,-8.89921242557729e-05,-2.899914452727788e-05,-9.99990000099588e-06,-2.999989500049026e-06,-9.999984999603102e-07,-9.999994999391884e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0005218638053935734,-0.0004638654873286288,-3.799851806232993e-05,-1.299982450270071e-05,-4.999977500118572e-06,-9.999984999603102e-07,-9.999994999391884e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0])

s3 = pd.Series([0.0006613932897393013,0.0002659978876289742,0.000658737582405648,0.0005623339888467145,0.0008417590777197284,0.000542090011101782,0.0007813756301534222,0.0003713395103963933,0.0001847566192768637,0.0005892778635844672,-0.0001955367110279687,0.0004436264576506058,0.000302660947173135,0.0007556577955957223,0.0004099113835531532,0.0002143017625986564,1.052211101549051e-05,6.481751166152551e-05,6.615670911548045e-05,-2.169766854576383e-05,-1.302819997635433e-05,-7.303052044212008e-06,-0.1163297855507419,-0.06335289603465369,-0.03314811069814094,-0.01697505737063765,-0.008591697883893402,-0.004342398361182662,-0.002157940126839023,-0.001100682037128825,-0.0005507856703497119,-0.0002554269710891206,-0.0001277329565522002,-8.395111298446951e-05,-2.189884089509773e-05,-1.094960028496637e-05,-5.479844975342307e-06,-2.739933748392279e-06,-1.369969689294177e-06,-6.799856523827107e-07,-3.399929995978179e-07,-1.79996340600251e-07,-7.999838400850306e-08,-3.999919442393075e-08,-2.999939675042158e-08,-2.007979819879551e-05,-1.004005030070562e-05,-5.52007060169889e-06,-2.760046727695654e-06,9.150125677134498e-06,4.580031464668292e-06,2.2900078662783e-06,1.150001972312828e-06,5.700004873407606e-07,2.80000120302654e-07,1.50000032247295e-07,7.000000733862829e-08,3.000000181016647e-08,2.000000056662899e-08,1.00000003333145e-08,1.000000011126989e-08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0001499887511247459,-7.499156348433101e-05,-3.699790962233055e-05,-1.899945851585629e-05,-8.999869502079515e-06,-4.999962500264377e-06,-1.999992000039351e-06,-9.999974999814318e-07,-9.999984999603102e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.699983850190338e-05,-8.999878501628346e-06,-3.999972000122605e-06,-1.999992000039351e-06,-9.999974999814318e-07,0.0003669319382432873,-0.0001849488621671012,-9.198730581664589e-05,-4.499687272496313e-05,0.0009075453820856781,0.0004854184782060238,-0.000720221831477389,-0.000359805708801156,-0.0001799514136040646,-8.998785170075082e-05,-5.999640023402946e-05,-1.9999600008734e-05,-6.999954500263924e-06,-1.999995999958864e-06,-9.999994999391884e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.001201278176363365,-0.0008013581550363867,-0.0002669288650428971,-8.89921242557729e-05,-2.899914452727788e-05,-9.99990000099588e-06,-2.999989500049026e-06,-9.999984999603102e-07,-9.999994999391884e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0005218638053935734,-0.0004638654873286288,-3.799851806232993e-05,-1.299982450270071e-05,-4.999977500118572e-06,-9.999984999603102e-07,-9.999994999391884e-07,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0])

最佳答案

它看起来像是旧 Pandas 版本中的错误。我可以在 win32、Pandas 1.0.3、numpy 1.15.4 上的旧安装 Python 3.6.2 64 位上重现:

>>> s3.rolling(20,min_periods=3).kurt().tail(10)
890         9.591071
891         9.591071
892         9.591071
893         9.591071
894        19.663685
895        15.248361
896        40.444894
897      1368.233241
898    251407.375343
899    902540.031652
dtype: float64

它似乎在我的新版本 Python 3.8.4 64 位、Pandas 1.2.2、numpy 1.20.1 上得到修复:

>>> s3.rolling(20,min_periods=3).kurt().tail(10)
890     9.591067
891     9.591067
892     9.591067
893     9.591067
894    19.663666
895    14.872262
896    14.147158
897    16.716989
898     7.037037
899    20.000000
dtype: float64

两个安装都在同一台 Windows 10 机器上。

我不能说是哪个组件(Pandas 或 numpy)引起的。由于您使用 numpy.stats.kurtosis 的测试给出了正确的结果,我会怀疑 Pandas,但如果没有 Pandas 专家(我不是专家)的进一步分析,我不能肯定。

恕我直言,最合理的解决方案是升级您的系统,或者使用最新的 Pandas 版本添加全新的独立 Python 安装。

关于python - Pandas Rolling vs Scipy kurtosis - 严重的数值不准确,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66477764/

相关文章:

python - 在列表中查找元素并输出子列表项

python - 为什么 1-0.9 不是 0.1?

python - 从两个列表中排序分布的夫妇

python - Jinja 模板中的惰性变量查找

python - 在 python pandas 中将年龄段添加到数据框

python - pandas - 从多索引列获取值

python - Pip 安装到 python3.6 但我在 Ubuntu 18.04 上使用 python3.7 和 VS Code

python - 在数据框中不使用循环的情况下进行简单的 Excel min 计算

python - pandas groupby 到嵌套的 json——不需要计算字段

python - 使用 Pandas 的 NaN 过滤时间序列中的空洞