data = {
'org_id' :[4,73,6,77,21,36,40,22,21,30,31],
'flag': [['4', '73'],['73'],['6', '77'],['77'],['21'],['36', '36'],['40'],['22', '41'],['21'],['22', '30'],['31', '31']],
'r_id' : [4,4,6,6,20,20,20,22,28,28,28]
}
df = pd.DataFrame.from_dict(data)
df
所需的数据框如下所示,
data = {
'org_id' :[4,73,6,77,21,36,40,22,21,30,31],
'flag': [['4', '73'],['73'],['6', '77'],['77'],['21'],['36', '36'],['40'],['22', '41'],['21'],['22', '30'],['31', '31']],
'r_id' : [4,4,6,6,20,20,20,22,28,28,28],
'is_foundin_org_id': ['yes','yes','yes','yes','NO','NO','NO','yes','NO','NO','NO']
}
df2 = pd.DataFrame.from_dict(data)
df2
输出数据框
Out[115]:
org_id flag r_id is_foundin_org_id
0 4 [4, 73] 4 yes
1 73 [73] 4 yes
2 6 [6, 77] 6 yes
3 77 [77] 6 yes
4 21 [21] 20 NO
5 36 [36, 36] 20 NO
6 40 [40] 20 NO
7 22 [22, 41] 22 yes
8 21 [21] 28 NO
9 30 [22, 30] 28 NO
10 31 [31, 31] 28 NO
按r_id分组后需要识别r_id是否存在于r_id分组的行中,例如:当我在 org_id 的某一行中找到 group by 4 时,因此我对组 4 标记为"is",类似地,在 org_id 列中找不到 20,因此我对所有 20 组标记为“否”。谢谢你。
最佳答案
试试这个
d = {True: 'Yes', False: 'No'}
df['is_foundin_org_id'] = (df.org_id.eq(df.r_id).groupby(df.r_id)
.transform('max').map(d))
Out[1549]:
org_id flag r_id is_foundin_org_id
0 4 [4, 73] 4 Yes
1 73 [73] 4 Yes
2 6 [6, 77] 6 Yes
3 77 [77] 6 Yes
4 21 [21] 20 No
5 36 [36, 36] 20 No
6 40 [40] 20 No
7 22 [22, 41] 22 Yes
8 21 [21] 28 No
9 30 [22, 30] 28 No
10 31 [31, 31] 28 No
关于pandas - 检查是否在 pandas 数据框中的列列表中找到了组 ID 或元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59142727/