我有一个包含 100 多个段落的文本文件。我想查找并列出包含特定字符串的单词。
这是我的文本文件内容:
A computer is a general purpose device that can be programmed to carry out a set of arithmetic or logical operations automatically. Since a sequence of operations can be readily changed, the computer can solve more than one kind of problem.
我想检索包含ra
的单词。它应该返回
常规
、编程
和操作
。
这是我的代码:
with open('computer.txt', 'r') as searchfile:
for line in searchfile:
if "ra" in line:
line_split = line.split(' ')
for each in line_split:
if "ra" in each:
print each
最有效的方法是什么?
最佳答案
正则表达式在这里可以很好地工作:
>>> import re
>>> r = re.compile(r"\b\w*ra\w*\b")
>>> r.findall("A computer is a general purpose device that can be programmed to carry out a set of arithmetic or logical operations automatically. Since a sequence of operations can be readily changed, the computer can solve more than one kind of problem.")
['general', 'programmed', 'operations', 'operations']
此列表包含重复项,可以通过简单的 set()
调用删除这些重复项(这又会删除元素的顺序,因此如果您需要保留它,则需要做更多的工作)。
请注意,正则表达式对于“单词”的理解相当幼稚:
\b # Start of an alphanumeric word
\w* # Match any number of word characters [A-Za-z0-9_]
ra # Match ra
\w* # Match any number of word characters
\b # End of a word
关于python - 如果文本文件内容中存在几个字符,则获取整个单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26273796/