python - 计算 python 中的唯一单词

直接来说，到目前为止我的代码是这样的:

from glob import glob
pattern = "D:\\report\\shakeall\\*.txt"
filelist = glob(pattern)
def countwords(fp):
    with open(fp) as fh:
        return len(fh.read().split())
print "There are" ,sum(map(countwords, filelist)), "words in the files. " "From directory",pattern

我想添加一个代码来计算模式中的唯一单词(此路径中有 42 个 txt 文件)，但我不知道如何操作。谁能帮帮我？

最佳答案

在 Python 中计算对象的最佳方法是使用 collections.Counter类，这是为此目的而创建的。它的作用类似于 Python 字典，但在计数时使用起来更容易一些。您只需传递一个对象列表，它就会自动为您计数。

>>> from collections import Counter
>>> c = Counter(['hello', 'hello', 1])
>>> print c
Counter({'hello': 2, 1: 1})

计数器还有一些有用的方法，比如most_common，访问documentation了解更多。

Counter 类的一个非常有用的方法是更新方法。通过传递对象列表实例化 Counter 后，您可以使用更新方法执行相同操作，它将继续计数而不会丢弃对象的旧计数器:

>>> from collections import Counter
>>> c = Counter(['hello', 'hello', 1])
>>> print c
Counter({'hello': 2, 1: 1})
>>> c.update(['hello'])
>>> print c
Counter({'hello': 3, 1: 1})

关于python - 计算 python 中的唯一单词，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/11899878/

上一篇：python - 如何使用正则表达式从文本中查找特定单词并返回所有匹配项？

下一篇：python 2.6-有效地删除和计算字典列表中的重复项

相关文章：

Python-查找具有唯一数字的最接近的更大数字

python - 运行docker镜像提供-无法找到镜像错误

hadoop - 使用 Hadoop 2.6.0 在 Windows 上运行 wordcount Hadoop 示例

java - Hadoop Java 字数统计调整不起作用 - 尝试对所有内容进行求和

hadoop - hadoop字数示例

python - 如何在图形中间画轴？

python - Scrapy: start_requests() 的正确使用方法是什么？

java - 计算句子中超过最小字母要求的单词数的程序

C - 多线程 WordCount 运行时崩溃 - 现在编译失败 : redefinition of struct timespec?

python - pandas 使用 itertuples 编辑单元格值