python - 基于 str.contains (或类似)跨数据框列进行计数

我想计算每行中包含特定字符串的单元格数量，多次包含特定字符串的单元格应该只计算一次。

我可以计算一行中等于给定值的单元格数量，但是当我扩展此逻辑以使用 str.contains 时，我遇到了问题，如下所示


d = {'col1': ["a#", "b","c#"], 'col2': ["a", "b","c#"]}
df = pd.DataFrame(d)

#can correctly count across rows using equality 
thisworks =( df =="a#" ).sum(axis=1)

#can count across  a column using str.contains
thisworks1=df['col1'].str.contains('#').sum()

#but cannot use str.contains with a dataframe so what is the alternative
thisdoesnt =( df.str.contains('#') ).sum(axis=1)

输出应该是一系列显示每行中包含给定字符串的单元格数量。

最佳答案

str.contains 是一个系列方法。要将其应用到整个数据帧，您需要 agg 或 apply 例如:

df.agg(lambda x: x.str.contains('#')).sum(1)

Out[2358]:
0    1
1    0
2    2
dtype: int64

如果您不喜欢 agg 也不喜欢 apply，您可以使用 np.char.find 直接处理底层 numpy 数组df

(np.char.find(df.values.tolist(), '#') + 1).astype(bool).sum(1)

Out[2360]: array([1, 0, 2])

将其传递给 df 的系列或列

pd.Series((np.char.find(df.values.tolist(), '#') + 1).astype(bool).sum(1), index=df.index)

Out[2361]:
0    1
1    0
2    2
dtype: int32

关于python - 基于 str.contains (或类似)跨数据框列进行计数，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56351383/

上一篇：python - 十六进制和十进制中的相同数字返回不同的 id

下一篇：python - 如何在 Matplotlib 中切割长标签

相关文章：

python - XLDateAmbiguous 解决方法

python - 如何获取日期列表中一个月的最后一天

python - 找到两对总和为相同值的对

python - 在 SQLAlchemy 中使用字典/数据帧值执行更新

python使用lxml解析html表

python - 如何将元素附加到 DataFrame 中的列表？

python - 使用 2D 掩码和整个矩阵运算索引的 3D 或 4D Numpy 数组

python - Numpy 数组被舍入？小 float 的减法

python - 如何使用 Celery 制作包含所有待处理任务的仪表板？

python - 为什么 scikit-image 中的 local_binary_pattern 函数为不同的模式提供相同的值？