python - Pandas - DF 与列表 - 查找与任何列中的字符串匹配的所有行

我有以下数据框:

ID     col1        col2     col3
0     ['a','b'] ['d','c'] ['e','d']
1     ['s','f'] ['f','a'] ['d','aaa']

给出一个输入字符串 = 'a' 我想接收这样的数据帧:

ID     col1      col2     col3
0       1          0        0
1       0          1        0

我知道如何使用 for 循环来做到这一点，但这需要很长时间，而且一定有一个我错过的方法

最佳答案

pandas 中的列表处理不支持矢量化，因此性能比标量更差。

第一个想法是通过 DataFrame.stack 将列 reshape 为 Series ，通过 Series.explode 创建标量，因此可以通过 a 进行比较，通过 Series.any 测试每个第一个级别是否匹配，最后使用将 bool 掩码转换为整数来重新整形:

df1 = df.set_index('ID').stack().explode().eq('a').any(level=[0,1]).unstack().astype(int)
print (df1)
    col1  col2  col3
ID                  
0      1     0     0
1      0     1     0

或者可以使用DataFrame.applymap通过带有 in 的 lambda 函数进行元素测试:

df1 = df.set_index('ID').applymap(lambda x: 'a' in x).astype(int)

或者为每个列表列创建DataFrame，这样就可以通过a使用 DataFrame.any 进行测试:

f = lambda x: pd.DataFrame(x.tolist(), index=x.index).eq('a').any(axis=1)
df1 = df.set_index('ID').apply(f).astype(int)

关于python - Pandas - DF 与列表 - 查找与任何列中的字符串匹配的所有行，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/67821267/

上一篇：amazon-web-services - AWS CDK : enabling access logging for classical load balancer

下一篇：java - 尝试使用循环来处理备用数组

python - google.api_core.exceptions.Unknown : None There was a problem opening the stream. 尝试打开 DEBUG 级别日志以查看错误

Python 3.4 - Pandas - 帮助正确排列数据框列和删除无效列

python - 在 Pandas 中拆分列

python - 为什么字符串会导致整个 pandas DataFrame 成为非数字？

python - pip freeze 忽略某些包

python - 具有多个查找字段的 Rest 调用用于反向查找

python - 访问python枚举成员时如何检测和调用函数

python - 选择列之间的最小值python

python - 根据字符串条件为 Pandas 数据框列赋值