python - 在 Python 中获取组之间的累积平均值

我试图在不同组的python中获得累积平均值。
我有如下数据:

id  date        value
1   2019-01-01  2
1   2019-01-02  8
1   2019-01-04  3
1   2019-01-08  4
1   2019-01-10  12
1   2019-01-13  6
2   2019-01-01  4
2   2019-01-03  2
2   2019-01-04  3
2   2019-01-06  6
2   2019-01-11  1

我试图得到这样的输出:

id  date        value   cumulative_avg
1   2019-01-01  2   NaN
1   2019-01-02  8   2
1   2019-01-04  3   5
1   2019-01-08  4   4.33
1   2019-01-10  12  4.25
1   2019-01-13  6   5.8
2   2019-01-01  4   NaN
2   2019-01-03  2   4
2   2019-01-04  3   3
2   2019-01-06  6   3
2   2019-01-11  1   3.75

我需要累积平均值来重新启动每个新 ID。
我可以通过单个获得我正在寻找的内容的变化，例如，如果数据集只有 id = 1 的数据，那么我可以使用:

df['cumulative_avg'] = df['value'].expanding.mean().shift(1)

我尝试向其中添加一个组，但出现错误:

df['cumulative_avg'] = df.groupby('id')['value'].expanding().mean().shift(1)

TypeError: incompatible index of inserted column with frame index

还试过:

df.set_index(['account']
ValueError: cannot handle a non-unique multi-index!

我拥有的实际数据有数百万行和数千个唯一 ID。任何以快速/有效方式执行此操作的帮助将不胜感激。

最佳答案

对于许多组来说，这会表现得更好，因为它抛弃了 apply 。将 cumsum 除以 cumcount ，减去该值以得到 expanding 的模拟值。幸运的是，pandas 将 0/0 解释为 NaN 。

gp = df.groupby('id')['value']
df['cum_avg'] = (gp.cumsum() - df['value'])/gp.cumcount()

    id        date  value   cum_avg
0    1  2019-01-01      2       NaN
1    1  2019-01-02      8  2.000000
2    1  2019-01-04      3  5.000000
3    1  2019-01-08      4  4.333333
4    1  2019-01-10     12  4.250000
5    1  2019-01-13      6  5.800000
6    2  2019-01-01      4       NaN
7    2  2019-01-03      2  4.000000
8    2  2019-01-04      3  3.000000
9    2  2019-01-06      6  3.000000
10   2  2019-01-11      1  3.750000

关于python - 在 Python 中获取组之间的累积平均值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/59759856/

python - 在 Python 中获取组之间的累积平均值

上一篇：scp - 如何启用sshpass输出到控制台

下一篇：google-chrome - chrome 开发工具 "continuous page repainting"选项在哪里？