python - 在数据框单元格中搜索关键字

我目前有一个数据框，其中有一列包含一些单词或字符，我试图通过相应单元格中的搜索关键字对每一行进行分类。

示例

  words             |   category
-----------------------------------
im a test email     |  email
here is my handout  |  handout

这就是我所拥有的

conditions = [
        (df['words'].str.contains('flyer',False,regex=True)),
        (df['words'].str.contains('report',False,regex=True)),
        (df['words'].str.contains('form',False,regex=True)), 
        (df['words'].str.contains('scotia',False,regex=True)),  
        (df['words'].str.contains('news',False,regex=True)), 
         (df_prt_copy['words'].str.contains('questions.*\.pdf',False,regex=True)),
         .
         .
         .
         .
    ]
    choices = ['open house flyer', 
               'report', 
               'form', 
               'report',
               'news', 
               ‘question',
                  .
                  .
                  .
                  .
              ]
     df['category']=np.select(conditions, choices, default='others')

这工作正常，但问题是我有很多关键字(可能超过 120 个左右)，所以维护这个关键字列表非常困难，有没有更好的方法来做到这一点？顺便说一句，我正在使用 python3

注意:我正在寻找一种更简单的方法来管理大量关键字，这与简单地查找关键字的方法不同 here

最佳答案

如果一行中有多个关键字，您可以连接所有关键字并使用 str.findall，然后map 到 cond 与 Choices 的字典:

df = pd.DataFrame({"words":["im a test email",
                            "here is my handout",
                            "This is a flyer"]})

choices = {"flyer":"open house flyer",
           "email":"email from someone",
           "handout":"some handout"}

df["category"] = df["words"].str.findall("|".join(choices.keys())).str.join(",").map(choices)

print (df)

#
                words            category
0     im a test email  email from someone
1  here is my handout        some handout
2     This is a flyer    open house flyer

关于python - 在数据框单元格中搜索关键字，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58742006/

python - 在数据框单元格中搜索关键字

上一篇：python - Librosa Mel 谱图对数形状

下一篇：python - 根据 pandas 中的其他数据帧过滤一个数据帧