python - 在 Pandas 的整个列中计算字符串的出现次数

标签 python regex pandas dataframe contains

考虑以下数据框:

import pandas as pd
df = pd.DataFrame(["What is the answer", 
                   "the answer isn't here, but the answer is 42" , 
                   "dogs are nice", 
                   "How are you"], columns=['words'])
df
                                         words
0                           What is the answer
1  the answer isn't here, but the answer is 42
2                                dogs are nice
3                                  How are you

我想统计某个字符串出现的次数，它可能在每个索引中重复几次。

例如，我想统计答案出现的次数。我试过:

df.words.str.contains(r'the answer').count()

我希望得到一个解决方案，但输出是 4。我不明白为什么。 答案出现了 3 次。

What is **the answer**
**the answer** isn't here, but **the answer** is 42

注意:搜索字符串可能在行中出现多次

最佳答案

你需要str.count

In [5285]: df.words.str.count("the answer").sum()
Out[5285]: 3

In [5286]: df.words.str.count("the answer")
Out[5286]:
0    1
1    2
2    0
3    0
Name: words, dtype: int64

关于python - 在 Pandas 的整个列中计算字符串的出现次数，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/46893629/

上一篇：Python:按选项卡拆分txt文件

下一篇：php - Laravel View : syntax error, 意外 'endif' (T_ENDIF)

相关文章：

python - jsonschema验证未按预期返回错误？

python - pytest ScopeMismatch 错误 : how to use fixtures properly

java - 正则表达式从字符串中查找特定模式字符串

python - 在删除附加列的同时旋转 Pandas 数据框

python - 如何从四分之一字符串推断日期时间？

python - google.api_core.exceptions.Unknown : None There was a problem opening the stream. 尝试打开 DEBUG 级别日志以查看错误

python - 如何使用 Boto 获取已启动实例的 IP 地址

php - 为什么部分匹配时 preg_match() 总是验证为 true？

regex - 使用 FilesMatch 缓存 .htaccess

python - 在 python 中，summary (dplyr) 函数类似