python - 如何有条件地标记组?

标签 python pandas

我是 pandas 新手,想知道如何执行以下操作: 给定特定条件,我想用特定标签标记整个组,而不仅仅是满足条件的行。 例如,如果我有一个像这样的 DataFrame:

import numpy as np
import pandas as pd

df = pd.DataFrame({"id": [1, 2, 3, 4, 5, 6, 7, 8],
                        "process": ["pending", "finished", "finished", "finished", "finished", "finished", "finished", "pending"],
                        "working_group": ["a", "a", "c", "d", "d", "f", "g", "g"],
                        "size": [2, 2, 1, 2, 2, 1, 2, 2]})

conditions = [(df['size'] >= 2) & (df['process'].isin(["pending"]))]

choices = ["not_done"]

df['state'] = df['state'] = np.select(conditions, choices, default = "something_else")

df:

   id   process working_group   size     state
0   1   pending             a     2  not_done
1   2  finished             a     2  something_else
2   3  finished             c     1  something_else
3   4  finished             d     2  something_else
4   5  finished             d     2  something_else
5   6  finished             f     1  something_else
6   7  finished             g     2  something_else
7   8   pending             g     2  not_done

但是,当单个任务待处理时,我希望将整个工作组标记为 not_done,因此例如 a & g 应标记为 not_done。

   id   process working_group  size     state
0   1   pending             a     2  not_done
1   2  finished             a     2  not_done
2   3  finished             c     1  something_else
3   4  finished             d     2  something_else
4   5  finished             d     2  something_else
5   6  finished             f     1  something_else
6   7  finished             g     2  not_done
7   8   pending             g     2  not_done

最佳答案

您可以使用:

condition = df['size'].ge(2) & df['process'].isin(["pending"])

df['state'] = np.where(condition.groupby(df['working_group']).transform('any'), 'not_done', 'something_else')

或者:

condition = df['size'].ge(2) & df['process'].isin(["pending"])

df['state'] = np.where(df['working_group'].isin(df.loc[condition, 'working_group']), 'not_done', 'something_else')

输出:

   id   process working_group  size           state
0   1   pending             a     2        not_done
1   2  finished             a     2        not_done
2   3  finished             c     1  something_else
3   4  finished             d     2  something_else
4   5  finished             d     2  something_else
5   6  finished             f     1  something_else
6   7  finished             g     2        not_done
7   8   pending             g     2        not_done

关于python - 如何有条件地标记组?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74481307/

相关文章:

python - Pandas DataFrame 的多个列表

python - Pandas:更改数据帧日期索引格式

python - 级别 NaN 必须与名称相同

python - 将新行添加到 MultiIndex DataFrame

python - 如何提取文本中每个可能的日期?

python - functools partial 是如何做到的?

Python orm 日期时间问题

python - 将 [x,y] 转换为单独的 [x] [y] 列表

python - 运行 python cgi 脚本解释器结果因浏览器而异

python - 如何选择哪个多索引轴将 groupby 对象中的数据拆分到不同的子图中?