我是 pandas 新手,想知道如何执行以下操作: 给定特定条件,我想用特定标签标记整个组,而不仅仅是满足条件的行。 例如,如果我有一个像这样的 DataFrame:
import numpy as np
import pandas as pd
df = pd.DataFrame({"id": [1, 2, 3, 4, 5, 6, 7, 8],
"process": ["pending", "finished", "finished", "finished", "finished", "finished", "finished", "pending"],
"working_group": ["a", "a", "c", "d", "d", "f", "g", "g"],
"size": [2, 2, 1, 2, 2, 1, 2, 2]})
conditions = [(df['size'] >= 2) & (df['process'].isin(["pending"]))]
choices = ["not_done"]
df['state'] = df['state'] = np.select(conditions, choices, default = "something_else")
df:
id process working_group size state
0 1 pending a 2 not_done
1 2 finished a 2 something_else
2 3 finished c 1 something_else
3 4 finished d 2 something_else
4 5 finished d 2 something_else
5 6 finished f 1 something_else
6 7 finished g 2 something_else
7 8 pending g 2 not_done
但是,当单个任务待处理时,我希望将整个工作组标记为 not_done,因此例如 a & g 应标记为 not_done。
id process working_group size state
0 1 pending a 2 not_done
1 2 finished a 2 not_done
2 3 finished c 1 something_else
3 4 finished d 2 something_else
4 5 finished d 2 something_else
5 6 finished f 1 something_else
6 7 finished g 2 not_done
7 8 pending g 2 not_done
最佳答案
您可以使用:
condition = df['size'].ge(2) & df['process'].isin(["pending"])
df['state'] = np.where(condition.groupby(df['working_group']).transform('any'), 'not_done', 'something_else')
或者:
condition = df['size'].ge(2) & df['process'].isin(["pending"])
df['state'] = np.where(df['working_group'].isin(df.loc[condition, 'working_group']), 'not_done', 'something_else')
输出:
id process working_group size state
0 1 pending a 2 not_done
1 2 finished a 2 not_done
2 3 finished c 1 something_else
3 4 finished d 2 something_else
4 5 finished d 2 something_else
5 6 finished f 1 something_else
6 7 finished g 2 not_done
7 8 pending g 2 not_done
关于python - 如何有条件地标记组?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/74481307/