我需要将 pandas 列中列出的关键字与列表中的关键字进行匹配,并创建一个包含匹配词的新列。示例:
my_list = ['machine learning', 'artificial intelligence', 'lasso']
数据:
listing keyword_column
I am looking for machine learning expert machine learning
Machine learning expert that knows lasso machine learning, lasso
Need a web designer
Artificial Intelligence application on... artificial intelligence
最佳答案
使用Series.str.findall
要获取列表中的所有值,请按 Series.str.join
连接在一起如有必要,通过 Series.str.lower
转换为小写字母:
这里还使用了带有 \b
的单词边界,用于正确匹配 my_list
中的整个单词。
my_list = ['machine learning', 'artificial intelligence', 'lasso']
import re
pat = '|'.join(r"\b{}\b".format(x) for x in my_list)
df['new'] = df['listing'].str.findall(pat, flags=re.I).str.join(', ').str.lower()
或者:
df['new'] = df['listing'].str.lower().str.findall(pat).str.join(', ')
print (df)
listing keyword_column \
0 I am looking for machine learning expert machine learning
1 Machine learning expert that knows lasso machine learning, lasso
2 Need a web designer NaN
3 Artificial Intelligence application on artificial intelligence
new
0 machine learning
1 machine learning, lasso
2
3 artificial intelligence
关于python - Pandas 匹配列表中的元素,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/56752417/