python - Pandas - 检查列标签是否存在于另一列的值中并更新该列

我有一长串词汇表，想检查段落中是否包含该词汇表，并标记 1 为是，0 为否，简化如下:

>>> glossary = ['phrase 1', 'phrase 2', 'phrase 3']
>>> glossary
['phrase 1', 'phrase 2', 'phrase 3']

>>> df= pd.DataFrame(['This is a phrase 1 and phrase 2', 'phrase 1', 
'phrase 3', 'phrase 1 & phrase 2. phrase 3 as well'],columns=['text'])
>>> df
                                text
0        This is a phrase 1 and phrase 2
1                               phrase 1
2                               phrase 3
3  phrase 1 & phrase 2. phrase 3 as well

按如下方式连接:

                                    text  phrase 1  phrase 2  phrase 3
0        This is a phrase 1 and phrase 2       NaN       NaN       NaN
1                               phrase 1       NaN       NaN       NaN
2                               phrase 3       NaN       NaN       NaN
3  phrase 1 & phrase 2. phrase 3 as well       NaN       NaN       NaN

我希望实现每个词汇表列与文本列进行比较，如果词汇表在文本中则更新 1，如果不在文本中则更新 0，在本例中为

                                    text  phrase 1  phrase 2  phrase 3
0        This is a phrase 1 and phrase 2       1       1       0
1                               phrase 1       1       0       0
2                               phrase 3       0       0       1
3  phrase 1 & phrase 2. phrase 3 as well       1       1       1

你能告诉我如何实现它吗？鉴于在我的数据框中，词汇表列大约有 3000 列，因此我还想概括逻辑，使其基于列标签作为比较每行中相应文本的键。

最佳答案

您可以使用 str.contains 的列表理解和 concat对于 0,1 DataFrame 强制转换为 int:

L = [df['text'].str.contains(x) for x in glossary]
df1 = pd.concat(L, axis=1, keys=glossary).astype(int)
print (df1)
   phrase 1  phrase 2  phrase 3
0         1         1         0
1         1         0         0
2         0         0         1
3         1         1         1

然后join原文:

df = df.join(df1)
print (df)
                                    text  phrase 1  phrase 2  phrase 3
0        This is a phrase 1 and phrase 2         1         1         0
1                               phrase 1         1         0         0
2                               phrase 3         0         0         1
3  phrase 1 & phrase 2. phrase 3 as well         1         1         1

关于python - Pandas - 检查列标签是否存在于另一列的值中并更新该列，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/47952632/

python - Pandas - 检查列标签是否存在于另一列的值中并更新该列

上一篇：python - openweathermap API 错误

下一篇：python - pandas dataframe - 如果有新索引则添加新行，如果存在则用列数据补充索引