python - 按随机事件对数据帧进行分组，并使用组计数设置一个新列

我一直在努力尝试对数据集进行分类；也许有人可以帮助我或指出正确的方向。

我有一个数据框，其中包含一系列相继发生的事件，并且在某个随机点，一个事件被注册在其中一列中。它看起来像这样:

       Timestamp         Event
0  10/26/2015 22:50:15     0
1  10/26/2015 22:50:46     0
2  10/26/2015 22:50:50     0
3  10/26/2015 22:50:51     0
4  10/26/2015 22:51:15     1
5  10/26/2015 22:51:47     0
6  10/26/2015 22:52:38     0
7  10/26/2015 22:54:46     1
8  10/26/2015 22:55:46     0

我需要创建一个新列，用于标识在每次出现或事件“1”之前出现的每组记录。并在该组中设置一个计数器。结果应该是这样的:

       Timestamp         Event   Group
0  10/26/2015 22:50:15     0     1
1  10/26/2015 22:50:46     0     1
2  10/26/2015 22:50:50     0     1
3  10/26/2015 22:50:51     0     1
4  10/26/2015 22:51:15     1     1
5  10/26/2015 22:51:47     0     2
6  10/26/2015 22:52:38     0     2
7  10/26/2015 22:54:46     1     2

请注意，现在导致“1”事件的记录在结果中将被忽略。

最佳答案

您可以在 Event 列上使用 cumsum()，只要遇到 1 就会给出新的组 ID。与 shift() 结合使用，您将能够按预期创建 Group 列:

df['Group'] = df.Event.shift().cumsum().fillna(0) + 1

df.loc[df.index <= df.Event.iloc[::-1].idxmax()]   
# to filter trailing zero records

<小时/>

另一种选择:

g = df.Event.iloc[::-1].cumsum()
df.loc[g != 0, 'Group'] = g.max() - g + 1
df.dropna()

关于python - 按随机事件对数据帧进行分组，并使用组计数设置一个新列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/41686269/

上一篇：python - pandas dflocate仅保留第一项

下一篇：Python Pandas : Select Multiple Cell Values of one column based on the Value of another Column

python - 对数据帧进行排序时，为什么在没有 NaN 值的情况下得到 "TypeError: unorderable types: str() < float()"？

python - 如何获取发件人列表以及他们使用 Django 发送的相应业力总和？

python - 如何制作 pip "dry-run"？

python - SQLalchemy 在设置角色时不提交更改

python - 函数执行 python 2.7 中的 dictionary.get 中的键是否存在

python - 为网络图重构 pandas 数据框

python - 将 panda 中一列的多行整理为一行，同时保持列的数据类型

python - 在 Pandas 中读取、选择和重新排列列

Python:从多个文件读取数据到二维numpy数组或列表的方法