python - 在 pandas 中使用 shift 和 rolling 与 groupBy

标签 python pandas dataframe pandas-groupby

df = pd.DataFrame(dict(
    list(
        zip(["A", "B", "C"],
            [np.array(["id %02d" % i for i in range(1, 11)]).repeat(10),
            pd.date_range("2018-01-01", periods=100).strftime("%Y-%m-%d"),
            [i for i in range(10, 110)]])
        )
))

df = df.groupby(["A", "B"]).sum()

df["D"] = df["C"].shift(1).rolling(2).mean()

df

此代码生成以下内容:

我希望滚动逻辑针对每个新 ID 重新开始。现在，ID 02 正在使用 ID 01 的最后两个值来计算平均值。

如何实现？

最佳答案

我相信你需要groupby:

df['D'] = df["C"].shift(1).groupby(df['A'], group_keys=False).rolling(2).mean()
print (df.head(20))
                   C     D
A     B                   
id 01 2018-01-01  10   NaN
      2018-01-02  11   NaN
      2018-01-03  12  10.5
      2018-01-04  13  11.5
      2018-01-05  14  12.5
      2018-01-06  15  13.5
      2018-01-07  16  14.5
      2018-01-08  17  15.5
      2018-01-09  18  16.5
      2018-01-10  19  17.5
id 02 2018-01-11  20   NaN
      2018-01-12  21  19.5
      2018-01-13  22  20.5
      2018-01-14  23  21.5
      2018-01-15  24  22.5
      2018-01-16  25  23.5
      2018-01-17  26  24.5
      2018-01-18  27  25.5
      2018-01-19  28  26.5
      2018-01-20  29  27.5

或者:

df['D'] = df["C"].groupby(df['A']).shift(1).rolling(2).mean()
print (df.head(20))
                   C     D
A     B                   
id 01 2018-01-01  10   NaN
      2018-01-02  11   NaN
      2018-01-03  12  10.5
      2018-01-04  13  11.5
      2018-01-05  14  12.5
      2018-01-06  15  13.5
      2018-01-07  16  14.5
      2018-01-08  17  15.5
      2018-01-09  18  16.5
      2018-01-10  19  17.5
id 02 2018-01-11  20   NaN
      2018-01-12  21   NaN
      2018-01-13  22  20.5
      2018-01-14  23  21.5
      2018-01-15  24  22.5
      2018-01-16  25  23.5
      2018-01-17  26  24.5
      2018-01-18  27  25.5
      2018-01-19  28  26.5
      2018-01-20  29  27.5

关于python - 在 pandas 中使用 shift 和 rolling 与 groupBy，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48967165/

上一篇：python - python读取的复杂Matlab struct mat文件

下一篇：python - numpy中列表滑动的矢量化实现

相关文章：

python - 使用 dataframe.apply 在每列上调用唯一函数

python - SCons- *** 未找到 SConstruct 文件

python - 在多列 Pandas 上应用 lambda 行

python - 根据条件屏蔽 Pandas 数据框中的值

python - 删除以开头的列

r - 合并数据框中的重复行并创建新列

python - 使用 ChatGPT API 仅使用自定义知识而非一般知识来训练聊天机器人

python - 当我迭代列表并添加它们时，它给了我一个奇怪的总和

python - 将 agroupby 应用于数据后对齐索引

python - 性能:Python pandas DataFrame.to_csv append 逐渐变慢