python - Pandas 匹配列表中的元素

标签 python pandas

我需要将 pandas 列中列出的关键字与列表中的关键字进行匹配,并创建一个包含匹配词的新列。示例:

my_list = ['machine learning', 'artificial intelligence', 'lasso']


listing                                         keyword_column
I am looking for machine learning expert        machine learning
Machine learning expert that knows lasso        machine learning, lasso
Need a web designer                              
Artificial Intelligence application on...       artificial intelligence


使用Series.str.findall要获取列表中的所有值,请按 Series.str.join 连接在一起如有必要,通过 Series.str.lower 转换为小写字母:

这里还使用了带有 \b 的单词边界,用于正确匹配 my_list 中的整个单词。

my_list = ['machine learning', 'artificial intelligence', 'lasso']

import re

pat = '|'.join(r"\b{}\b".format(x) for x in my_list)
df['new'] = df['listing'].str.findall(pat, flags=re.I).str.join(', ').str.lower()


df['new'] = df['listing'].str.lower().str.findall(pat).str.join(', ')

print (df)
                                    listing           keyword_column  \
0  I am looking for machine learning expert         machine learning   
1  Machine learning expert that knows lasso  machine learning, lasso   
2                      Need a web designer                       NaN   
3    Artificial Intelligence application on  artificial intelligence   

0         machine learning  
1  machine learning, lasso  
3  artificial intelligence  

