python - Pandas 滚动最大与 groupby

标签 python python-3.x pandas dataframe group-by

我在让 Pandas 的 rolling 函数执行我希望的操作时遇到问题。我想让每一行计算该组中到目前为止的最大值。这是一个例子:

df = pd.DataFrame([[1,3], [1,6], [1,3], [2,2], [2,1]], columns=['id', 'value'])

看起来像

   id  value
0   1      3
1   1      6
2   1      3
3   2      2
4   2      1

现在我希望获得如下DataFrame:

   id  value
0   1      3
1   1      6
2   1      6
3   2      2
4   2      2

问题是当我这样做的时候

df.groupby('id')['value'].rolling(1).max()

我得到了相同的 DataFrame。当我这样做的时候

df.groupby('id')['value'].rolling(3).max()

我得到了一个带有 Nans 的 DataFrame。有人可以解释如何正确使用 rolling 或其他一些 Pandas 函数来获取我想要的 DataFrame 吗？

最佳答案

看起来你需要 cummax() 而不是 .rolling(N).max()

In [29]: df['new'] = df.groupby('id').value.cummax()

In [30]: df
Out[30]:
   id  value  new
0   1      3    3
1   1      6    6
2   1      3    6
3   2      2    2
4   2      1    2

时间(使用全新的 Pandas 版本 0.20.1):

In [3]: df = pd.concat([df] * 10**4, ignore_index=True)

In [4]: df.shape
Out[4]: (50000, 2)

In [5]: %timeit df.groupby('id').value.apply(lambda x: x.cummax())
100 loops, best of 3: 15.8 ms per loop

In [6]: %timeit df.groupby('id').value.cummax()
100 loops, best of 3: 4.09 ms per loop

注意: from Pandas 0.20.0 what's new

改进了 groupby().cummin() 和 groupby().cummax() 的性能(GH15048、GH15109、GH15561、GH15635)

关于python - Pandas 滚动最大与 groupby，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/43830545/

上一篇：Python 将字符串精确地拆分为一个空格。如果双空格使 "word"不是 "word"

下一篇：python - 在 Python 中声明一个数字。可能强调千？

相关文章：

python - 使用 pandas 读取包含许多命名列标签的 csv 文件

python - get_dumies 出现无法散列的类型 'list' 错误

python - 使用 Python 进行字符串解析

python - 在比较之前使用插值比较两个数字 pandas 数据帧 (x,y)

python - 在自定义类中包装记录器功能时显示正确的 funcName

python - Pandas read_csv 的 ParserError

python - 同时进行多个异步请求

python - 在 tkinter 中创建音量 slider

python ctypes - 传递 numpy 数组 - 奇数输出

python - 如何按多个级别的列过滤多索引数据框？