python - 如何在文本文件中搜索用户输入的单词列表？

我正在尝试使用 Python 3.4.1 制作一个简单的单词计数器程序，用户可以在其中输入逗号分隔的单词列表，然后在示例文本文件中分析这些单词的频率。

我目前不知道如何在文本文件中搜索输入的单词列表。

我首先尝试:

file = input("What file would you like to open? ")
f = open(file, 'r')
search = input("Enter the words you want to search for (separate with commas): ").lower().split(",")
search = [x.strip(' ') for x in search]
count = {}
for word in search:
    count[word] = count.get(word,0)+1
for word in sorted(count):
    print(word, count[word])

这导致:

What file would you like to open? twelve_days_of_fast_food.txt
Enter the words you want to search for (separate with commas): first, rings, the
first 1
rings 1
the 1

如果可以的话，我猜这个方法只提供了输入列表中单词的计数，而不是文本文件中输入单词列表的计数。然后我尝试了:

file = input("What file would you like to open? ")
f = open(file, 'r')
lines = f.readlines()
line = f.readline()
word = line.split()
search = input("Enter the words you want to search for (separate with commas): ").lower().split(",")
search = [x.strip(' ') for x in search]
count = {}
for word in lines:
    if word in search:
        count[word] = count.get(word,0)+1
for word in sorted(count):
    print(word, count[word])

这没有给我任何返回。事情是这样的:

What file would you like to open? twelve_days_of_fast_food.txt
Enter the words you want to search for (separate with commas): first, the, rings
>>>

我做错了什么？我该如何解决这个问题？

最佳答案

您首先读取所有行(进入行，然后尝试仅读取一行，但文件已经为您提供了所有行。在这种情况下f. readline() 给你一个空行。从那里开始你的脚本注定会失败；你不能计算空行中的单词数。

您可以改为循环遍历文件:

file = input("What file would you like to open? ")

search = input("Enter the words you want to search for (separate with commas): ")
search = [word.strip() for word in search.lower().split(",")]

# create a dictionary for all search words, setting each count to 0
count = dict.fromkeys(search, 0)

with open(file, 'r') as f:
    for line in f:
        for word in line.lower().split():
            if word in count:
                # found a word you wanted to count, so count it
                count[word] += 1

with 语句使用打开的文件对象作为上下文管理器；这只是意味着完成后它将再次自动关闭。

for line in f: 循环迭代输入文件中的每个单独行；这比使用 f.readlines() 将所有行一次读入内存更有效。

我还稍微清理了您的搜索词剥离，并将 count 字典设置为 1，并将所有搜索词预定义为 0；这使得实际计数变得更容易。

因为您现在拥有包含所有搜索单词的字典，所以最好针对该字典测试匹配单词。针对字典进行测试比针对列表进行测试更快(列表中的单词越多，后者的扫描时间就越长，而字典测试平均需要恒定时间，无论字典中的项目数量如何)。

关于python - 如何在文本文件中搜索用户输入的单词列表？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/27085244/

python - 如何在文本文件中搜索用户输入的单词列表？

上一篇：python - 'Murmur3Partitioner' 的 C 扩展未编译

下一篇：python - 在几个列表中查找重复项