我正在尝试使用 Python 3.4.1 制作一个简单的单词计数器程序,用户可以在其中输入逗号分隔的单词列表,然后在示例文本文件中分析这些单词的频率。
我目前不知道如何在文本文件中搜索输入的单词列表。
我首先尝试:
file = input("What file would you like to open? ")
f = open(file, 'r')
search = input("Enter the words you want to search for (separate with commas): ").lower().split(",")
search = [x.strip(' ') for x in search]
count = {}
for word in search:
count[word] = count.get(word,0)+1
for word in sorted(count):
print(word, count[word])
这导致:
What file would you like to open? twelve_days_of_fast_food.txt
Enter the words you want to search for (separate with commas): first, rings, the
first 1
rings 1
the 1
如果可以的话,我猜这个方法只提供了输入列表中单词的计数,而不是文本文件中输入单词列表的计数。然后我尝试了:
file = input("What file would you like to open? ")
f = open(file, 'r')
lines = f.readlines()
line = f.readline()
word = line.split()
search = input("Enter the words you want to search for (separate with commas): ").lower().split(",")
search = [x.strip(' ') for x in search]
count = {}
for word in lines:
if word in search:
count[word] = count.get(word,0)+1
for word in sorted(count):
print(word, count[word])
这没有给我任何返回。事情是这样的:
What file would you like to open? twelve_days_of_fast_food.txt
Enter the words you want to search for (separate with commas): first, the, rings
>>>
我做错了什么?我该如何解决这个问题?
最佳答案
您首先读取所有行(进入行
,然后尝试仅读取一行,但文件已经为您提供了所有行。在这种情况下f. readline()
给你一个空行。从那里开始你的脚本注定会失败;你不能计算空行中的单词数。
您可以改为循环遍历文件:
file = input("What file would you like to open? ")
search = input("Enter the words you want to search for (separate with commas): ")
search = [word.strip() for word in search.lower().split(",")]
# create a dictionary for all search words, setting each count to 0
count = dict.fromkeys(search, 0)
with open(file, 'r') as f:
for line in f:
for word in line.lower().split():
if word in count:
# found a word you wanted to count, so count it
count[word] += 1
with
语句使用打开的文件对象作为上下文管理器;这只是意味着完成后它将再次自动关闭。
for line in f:
循环迭代输入文件中的每个单独行;这比使用 f.readlines()
将所有行一次读入内存更有效。
我还稍微清理了您的搜索词剥离,并将 count
字典设置为 1,并将所有搜索词预定义为 0
;这使得实际计数变得更容易。
因为您现在拥有包含所有搜索单词的字典,所以最好针对该字典测试匹配单词。针对字典进行测试比针对列表进行测试更快(列表中的单词越多,后者的扫描时间就越长,而字典测试平均需要恒定时间,无论字典中的项目数量如何)。
关于python - 如何在文本文件中搜索用户输入的单词列表?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27085244/