我下面有一个代码,您可以看到我正在使用 np.select
来识别列中的字符串是否包含任何代码,并根据逻辑创建带有描述的引用列.
# Creating Score column
col = 'codes_desc'
conditions = [(df_merged[col].str.contains('R27', case=False)),
(df_merged[col].str.contains('R38', case=False)),
(df_merged[col].str.contains('R52', case=False)),
(df_merged[col].str.contains('R62', case=False)),
(df_merged[col].str.contains('R21', case=False)),
(df_merged[col].str.contains('R22', case=False)),
(df_merged[col].str.contains('R23', case=False)),
(df_merged[col].str.contains('R57', case=False)),
(df_merged[col].str.contains('R82', case=False)),
(df_merged[col].str.contains('R86', case=False)),
(df_merged[col].str.contains('R20', case=False)),
(df_merged[col].str.contains('R98', case=False))
]
choices = [
'The person is a Ninja',
'The person is a Pirate',
'The person is a Doctor',
'The person is a Samurai',
'The person is a Admiral',
'The person is a Police',
'The person is a Teacher',
'The person is a Singer',
'The person is a Guitarist',
'The person is a Chef',
'The person is a Runner',
'The person is a Wizard'
]
df_merged["reference"] = np.select(conditions, choices, default= 'Reason Unknown')
但我在数据框中发现“codes_desc”列包含两个代码的情况,例如:
codes_desc
The selected codes are R27, R22.
在这种情况下,我希望我的输出类似于“引用”列中的内容:
1. 'The person is a Ninja'
2. 'The person is a Police'
但是由于 np.select
的工作方式类似于 case 语句;它获取最后的代码描述,那么如何做到这一点?
最佳答案
设置
print(df)
codes_desc
0 fo bar R20
1 R98 grok
2 R98, R21
3 R82
解决方案
让我们提取
所有匹配的代码,然后使用映射字典将代码
映射到相应的值,然后groupby
并使用 join
进行聚合
d = {'R27': 'The person is a Ninja',
'R38': 'The person is a Pirate',
'R52': 'The person is a Doctor',
'R62': 'The person is a Samurai',
'R21': 'The person is a Admiral',
'R22': 'The person is a Police',
'R23': 'The person is a Teacher',
'R57': 'The person is a Singer',
'R82': 'The person is a Guitarist',
'R86': 'The person is a Chef',
'R20': 'The person is a Runner',
'R98': 'The person is a Wizard'}
pat = r'\b(%s)\b' % '|'.join(d)
codes = df['codes_desc'].str.extractall(pat)[0]
df['reference'] = codes.map(d).groupby(level=0).agg(', '.join)
结果
codes_desc reference
0 fo bar R20 The person is a Runner
1 R98 grok The person is a Wizard
2 R98, R21 The person is a Wizard, The person is a Admiral
3 R82 The person is a Guitarist
关于python - 满足多个条件的 Pandas ;连接结果以创建新列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75104512/