python - 10、Python字符串中最常见的单词

标签 python

我需要显示文本文件中最常见的 10 个单词,从最频繁到最少以及它的使用次数。我无法使用字典或计数器功能。到目前为止我有这个:

import urllib
cnt = 0
i=0
txtFile = urllib.urlopen("http://textfiles.com/etext/FICTION/alice30.txt")
uniques = []
for line in txtFile:
    words = line.split()
    for word in words:
        if word not in uniques:
            uniques.append(word)
for word in words:
    while i<len(uniques):
        i+=1
        if word in uniques:
             cnt += 1
print cnt

现在我想我应该查找数组“uniques”中的每个单词,看看它在这个文件中重复了多少次,然后将其添加到另一个计算每个单词实例的数组中。但这就是我被困住的地方。我不知道如何继续。

如有任何帮助,我们将不胜感激。谢谢

最佳答案

使用Python集合可以轻松解决上述问题 下面是解决方案。

from collections import Counter

data_set = "Welcome to the world of Geeks " \
"This portal has been created to provide well written well" \
"thought and well explained solutions for selected questions " \
"If you like Geeks for Geeks and would like to contribute " \
"here is your chance You can write article and mail your article " \
" to contribute at geeksforgeeks org See your article appearing on " \
"the Geeks for Geeks main page and help thousands of other Geeks. " \

# split() returns list of all the words in the string
split_it = data_set.split()

# Pass the split_it list to instance of Counter class.
Counters_found = Counter(split_it)
#print(Counters)

# most_common() produces k frequently encountered
# input values and their respective counts.
most_occur = Counters_found.most_common(4)
print(most_occur)

关于python - 10、Python字符串中最常见的单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/27327303/

相关文章:

python - 从 NumPy 数组中的特定位置打印

python - 动态重新定义 numpy 数组

python - 如何通过python读取同一文件夹中多个docx文件中的表

python - matplotlib中以空格作为千​​位分隔符的科学刻度数

python - Python 3.4 Docker 容器中的 AWS Elastic Beanstalk 容器命令

python - list.__iadd__ 和 list.__add__ 的不同行为

Python 不使用截断删除文件的内容

python - 如何在不循环的情况下顺序更新 Numpy 数组

python - PyPNG: 'plane' 是什么意思?

python - 属性错误 : 'google.protobuf.pyext._message.RepeatedCompositeCo' object has no attribute 'append'