python - Pandas 识别组中第一行的列值

我有一个包含三列的数据框:

    ID       Date    Status
0    1   1/1/2000  Complete
1    1   1/4/2000  ReOpened
2    1  1/10/2000  ReOpened
3    1  1/11/2000    Closed
4    1  1/15/2000  ReOpened
5    2   1/2/2000  ReOpened
6    2   1/4/2000  ReOpened
7    2  1/10/2000    Closed
8    3  1/20/2000    Closed
9    3  1/22/2000    Closed
10   4  1/25/2000  ReOpened

对于每个 ID，如果有“重新打开”状态，我需要根据日期获取显示第一次“重新打开”的行。所以我的输出看起来像:

   ID ProductionDate    Status
0   1       1/4/2000  ReOpened
1   2       1/2/2000  ReOpened
2   4      1/25/2000  ReOpened

我尝试过: df = pd.np.where(df.Status.str.contains("ReOpened"), df.groupby(['ID']).first(),0) 但这并不工作。

最佳答案

在蒙版上使用 groupby 和 cumsum 执行此操作:

df[df['Status'].eq('ReOpened').groupby(df['ID']).cumsum() == 1] 

    ID       Date    Status
1    1   1/4/2000  ReOpened
5    2   1/2/2000  ReOpened
10   4  1/25/2000  ReOpened

您还可以在过滤后使用 groupby 和 first 来仅获取第一行:

df[df['Status'].eq('ReOpened')].groupby('ID', as_index=False).first()  

   ID       Date    Status
0   1   1/4/2000  ReOpened
1   2   1/2/2000  ReOpened
2   4  1/25/2000  ReOpened

如果性能很重要，您可以使用 eq 和 duplicated 将上述内容简化为单个 bool 索引操作:

df[df['Status'].eq('ReOpened') & ~df.duplicated(['ID', 'Status'])] 

    ID       Date    Status
1    1   1/4/2000  ReOpened
5    2   1/2/2000  ReOpened
10   4  1/25/2000  ReOpened

关于python - Pandas 识别组中第一行的列值，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56709170/

上一篇：Python 库默认 NullHandler 在单元测试时导致错误

下一篇：python - python 中的 input() 和\n 字符

python - pandas 数据帧上的数据透视表操作

python - 具有完整性要求的按频率分类的 Pandas Grouper

python - 返回 pandas 数据框中的行，其中列中的元组包含特定值

python - 使用新日期索引和带有标题子字符串的新列创建 Pandas DataFrame？

python - 检查列表字典中是否有值的最佳方法？

python - 如何让 Sprite 旋转面向鼠标？

Python - 多维数组

python - 计算Python中列中具有相同值的行

python - 如何使用 for 循环或条件在 pandas 数据框的子集中创建多个回归模型 (statsmodel)？