正如标题所示,我需要编写一个代码,返回频率最高的 5 个单词(来自输入字符串)的列表。这是我到目前为止所拥有的:
from collections import defaultdict
def top5_words(text):
tally = defaultdict(int)
words = text.split()
for word in words:
if word in tally:
tally[word] += 1
else:
tally[word] = 1
answer = sorted(tally, key=tally.get, reverse = True)
return(answer)
例如,如果您输入:top5_words(“一个一是一匹赛马二二也是一”)它应该返回:[“一”,“二”,"is",“一”,“赛马”]但是相反,它返回: ['one', 'was', 'two', 'racehorse', 'too', 'a'] - 有人知道这是为什么吗?
编辑:
这就是我现在所拥有的,感谢 Anand S Kumar:
import collections
def top5_words(text):
counts = collections.Counter(text.split())
return [elem for elem, _ in sorted(counts.most_common(),key=lambda x:(-x[1], x[0]))[:5]]
最佳答案
您应该使用collections.Counter
然后你可以使用它的方法 - most_common()
。示例-
import collections
def top5_words(text):
counts = collections.Counter(text.split())
return counts.most_common(5)
请注意,上面返回一个包含 5 个元组的列表,在每个元组中,第一个元素是实际单词,第二个元素是该单词的计数。
演示 -
>>> import collections
>>> def top5_words(text):
... counts = collections.Counter(text.split())
... return counts.most_common(5)
...
>>> top5_words("""As the title says, I need to write a code that returns a list of 5 words (from an input string) that have the highest frequency. This is what I have so far""")
[('that', 2), ('a', 2), ('I', 2), ('the', 2), ('have', 2)]
如果您只想要元素而不是计数,那么您还可以使用列表理解来获取该信息。示例-
import collections
def top5_words(text):
counts = collections.Counter(text.split())
return [elem for elem, _ in counts.most_common(5)]
演示 -
>>> import collections
>>> def top5_words(text):
... counts = collections.Counter(text.split())
... return [elem for elem, _ in counts.most_common(5)]
...
>>> top5_words("""As the title says, I need to write a code that returns a list of 5 words (from an input string) that have the highest frequency. This is what I have so far""")
['that', 'a', 'I', 'the', 'have']
对于评论中的新要求 -
it seems there's still an issue when it comes to words with the same frequency, how would I get it to sort same frequency words alphabetically?
您可以首先获取所有单词及其计数的列表,然后使用 sorted
,这样排序首先对计数进行排序,然后对元素本身进行排序(因此,当计数时,它会按字典顺序排序)是一样的)。示例-
import collections
def top5_words(text):
counts = collections.Counter(text.lower().split())
return [elem for elem, _ in sorted(counts.most_common(),key=lambda x:(-x[1], x[0]))[:5]]
演示 -
>>> import collections
>>> def top5_words(text):
... counts = collections.Counter(text.lower().split())
... return [elem for elem, _ in sorted(counts.most_common(),key=lambda x:(-x[1], x[0]))[:5]]
...
>>> top5_words("""As the title says, I need to write a code that returns a list of 5 words (from an input string) that have the highest frequency. This is what I have so far""")
['a', 'have', 'i', 'that', 'the']
关于Python - 返回频率最高的前 5 个单词,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/32546245/