因此,对于某些文本中的单个单词子串计数,我可以使用 some_text.split().count(single_word_substring)
.对于某些文本中的多字子串计数,我该如何做到这一点?
例子:
text = 'he is going to school. abc is going to school. xyz is going to school.'
to_be_found = 'going to school'
计数应为 3。text = 'he is going to school. abc is going to school. xyz is going to school.'
to_be_found = 'going to'
计数应为 3。text = 'he is going to school. abc is going to school. xyz is going to school.'
to_be_found = 'go'
计数应为 0。text = 'he is going to school. abc-xyz is going to school. xyz is going to school.'
to_be_found = 'school'
计数应为 3。text = 'he is going to school. abc-xyz is going to school. xyz is going to school.'
to_be_found = 'abc-xyz'
计数应为 1。假设1:一切都是小写的。
假设2:文本可以包含任何内容。
假设3:被发现的也可以包含任何东西。例如,
car with 4 passengers
, xyz & abc
, 等等。注意:基于 REGEX 的解决方案是可以接受的。我只是好奇是否可以不使用正则表达式(很高兴拥有并且仅供将来可能对此感兴趣的其他人)。
最佳答案
这是使用正则表达式的工作解决方案:
import re
def occurrences(text,to_be_found):
return len(re.findall(rf'\W{to_be_found}\W', text))
正则表达式中的大写 W 用于非单词字符,包括空格和其他标点符号。
关于python - 计算某些文本中多字子串的出现次数,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/65928241/