我需要显示文本文件中最常见的 10 个单词,从最频繁到最少以及它的使用次数。我无法使用字典或计数器功能。到目前为止我有这个:
import urllib
cnt = 0
i=0
txtFile = urllib.urlopen("http://textfiles.com/etext/FICTION/alice30.txt")
uniques = []
for line in txtFile:
words = line.split()
for word in words:
if word not in uniques:
uniques.append(word)
for word in words:
while i<len(uniques):
i+=1
if word in uniques:
cnt += 1
print cnt
现在我想我应该查找数组“uniques”中的每个单词,看看它在这个文件中重复了多少次,然后将其添加到另一个计算每个单词实例的数组中。但这就是我被困住的地方。我不知道如何继续。
如有任何帮助,我们将不胜感激。谢谢
最佳答案
使用Python集合可以轻松解决上述问题 下面是解决方案。
from collections import Counter
data_set = "Welcome to the world of Geeks " \
"This portal has been created to provide well written well" \
"thought and well explained solutions for selected questions " \
"If you like Geeks for Geeks and would like to contribute " \
"here is your chance You can write article and mail your article " \
" to contribute at geeksforgeeks org See your article appearing on " \
"the Geeks for Geeks main page and help thousands of other Geeks. " \
# split() returns list of all the words in the string
split_it = data_set.split()
# Pass the split_it list to instance of Counter class.
Counters_found = Counter(split_it)
#print(Counters)
# most_common() produces k frequently encountered
# input values and their respective counts.
most_occur = Counters_found.most_common(4)
print(most_occur)
关于python - 10、Python字符串中最常见的单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27327303/