python - 满足多个条件的 Pandas ;连接结果以创建新列

标签 python pandas dataframe

我下面有一个代码,您可以看到我正在使用 np.select 来识别列中的字符串是否包含任何代码,并根据逻辑创建带有描述的引用列.

# Creating Score column
col         = 'codes_desc'
conditions  = [(df_merged[col].str.contains('R27', case=False)),
               (df_merged[col].str.contains('R38', case=False)),
               (df_merged[col].str.contains('R52', case=False)),
               (df_merged[col].str.contains('R62', case=False)),
               (df_merged[col].str.contains('R21', case=False)),
               (df_merged[col].str.contains('R22', case=False)),
               (df_merged[col].str.contains('R23', case=False)),
               (df_merged[col].str.contains('R57', case=False)),
               (df_merged[col].str.contains('R82', case=False)),
               (df_merged[col].str.contains('R86', case=False)),
               (df_merged[col].str.contains('R20', case=False)), 
               (df_merged[col].str.contains('R98', case=False)) 
              ]

choices     = [ 
 'The person is a Ninja',
 'The person is a Pirate',
 'The person is a Doctor',
 'The person is a Samurai',
 'The person is a Admiral',
 'The person is a Police',
 'The person is a Teacher',
 'The person is a Singer',
 'The person is a Guitarist',
 'The person is a Chef',
 'The person is a Runner',
 'The person is a Wizard'
]
df_merged["reference"] = np.select(conditions, choices, default= 'Reason Unknown')

但我在数据框中发现“codes_desc”列包含两个代码的情况,例如:

codes_desc

The selected codes are R27, R22.

在这种情况下,我希望我的输出类似于“引用”列中的内容:

1. 'The person is a Ninja'
2. 'The person is a Police'

但是由于 np.select 的工作方式类似于 case 语句;它获取最后的代码描述,那么如何做到这一点?

最佳答案

设置

print(df)
   codes_desc
0  fo bar R20
1   R98 grok 
2    R98, R21
3         R82

解决方案

让我们提取所有匹配的代码,然后使用映射字典将代码映射到相应的值,然后groupby 并使用 join 进行聚合

d = {'R27': 'The person is a Ninja',
     'R38': 'The person is a Pirate',
     'R52': 'The person is a Doctor',
     'R62': 'The person is a Samurai',
     'R21': 'The person is a Admiral',
     'R22': 'The person is a Police',
     'R23': 'The person is a Teacher',
     'R57': 'The person is a Singer',
     'R82': 'The person is a Guitarist',
     'R86': 'The person is a Chef',
     'R20': 'The person is a Runner',
     'R98': 'The person is a Wizard'}


pat = r'\b(%s)\b' % '|'.join(d)
codes = df['codes_desc'].str.extractall(pat)[0]
df['reference'] = codes.map(d).groupby(level=0).agg(', '.join)

结果

   codes_desc                                        reference
0  fo bar R20                           The person is a Runner
1   R98 grok                            The person is a Wizard
2    R98, R21  The person is a Wizard, The person is a Admiral
3         R82                        The person is a Guitarist

关于python - 满足多个条件的 Pandas ;连接结果以创建新列,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/75104512/

相关文章:

python - 可以在没有数据库的情况下评估 sqlalchemy 子句吗?

python - 如何将随机整数设置为 Django CharField 的默认值?

python - 使用 Python 如何合并两列并仅在另一列中的数据存在时覆盖一列中的数据?

pandas - 了解pandas.DataFrame.corrwith方法进行按列和按行的spearman排名相关计算

python - 如何使数据框中的每一行的每一列都有一个值?

python - 如何在没有QProxyStyle的情况下修改样式提示?

python - Discord 音乐机器人

Python pandas resample 添加的日期不存在于原始数据中

python - 将数据框一分为二并使用代字号 ~ 作为变量

dataframe - 如何在 pyspark - dataframe 中将月份名称更改为其他语言