我在 Pandas 中有以下数据框
job_desig salary
senior analyst 12
junior researcher 5
scientist 20
sr analyst 12
现在我想生成一列,其标志设置如下
sr = ['senior','sr']
job_desig salary senior_profile
senior analyst 12 1
junior researcher 5 0
scientist 20 0
sr analyst 12 1
我正在用 pandas 跟随
df['senior_profile'] = [1 if x.str.contains(sr) else 0 for x in
df['job_desig']]
最佳答案
您可以通过 |
连接列表的所有值,用于正则表达式 OR
,传递给 Series.str.contains
最后将 True/False
转换为 1/0
映射的整数:
df['senior_profile'] = df['job_desig'].str.contains('|'.join(sr)).astype(int)
如有必要,使用单词边界:
pat = '|'.join(r"\b{}\b".format(x) for x in sr)
df['senior_profile'] = df['job_desig'].str.contains(pat).astype(int)
print (df)
job_desig salary senior_profile
0 senior analyst 12 1
1 junior researcher 5 0
2 scientist 20 0
3 sr analyst 12 1
集合的解决方案,如果列表中只有一个单词值:
df['senior_profile'] = [int(bool(set(sr).intersection(x.split()))) for x in df['job_desig']]
关于python - 如何检查文本列是否包含 Pandas 中的特定字符串,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56302411/