python - txt 文件程序的字数统计

我正在使用以下代码计算 txt 文件的字数:

#!/usr/bin/python
file=open("D:\\zzzz\\names2.txt","r+")
wordcount={}
for word in file.read().split():
    if word not in wordcount:
        wordcount[word] = 1
    else:
        wordcount[word] += 1
print (word,wordcount)
file.close();

这给了我这样的输出:

>>> 
goat {'goat': 2, 'cow': 1, 'Dog': 1, 'lion': 1, 'snake': 1, 'horse': 1, 'ï»¿': 1, 'tiger': 1, 'cat': 2, 'dog': 1}

但我希望以下列方式输出:

word  wordcount
goat    2
cow     1
dog     1.....

我还在输出中得到一个额外的符号 (ï»¿)。我怎样才能删除它？

最佳答案

您遇到的有趣符号是 UTF-8 BOM (Byte Order Mark) .要摆脱它们，请使用正确的编码打开文件(我假设您使用的是 Python 3):

file = open(r"D:\zzzz\names2.txt", "r", encoding="utf-8-sig")

此外，对于计数，您可以使用 collections.Counter :

from collections import Counter
wordcount = Counter(file.read().split())

显示它们:

>>> for item in wordcount.items(): print("{}\t{}".format(*item))
...
snake   1
lion    2
goat    2
horse   3

关于python - txt 文件程序的字数统计，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/21107505/

上一篇：python - 什么是奎宁？拥有它们有什么特定目的吗？

下一篇：python - 描述方法调用并记录分析结果的装饰器

相关文章：

python - Pandas :str.contains 使用正则表达式

python - 使用第一列作为索引的 Excel 到 Pandas DataFrame

python - 手动定义种子特征的 "where clause"？

python - 蒸馏器 : Create script which does not "block" console when starting wx application

python - 在 Pandas 中操作多索引列

python - 在 Bokeh 中使用 HoverTool 将本地镜像嵌入相对路径

python - 无法在虚拟机上的 ubuntu-13 中找到包 virtualenv

python - 在没有 jasperserver 的情况下从 python 中运行 jasper 报告(使用 iReport 创建)？

python - 如何组合两个嵌套元组并在 Django 的 CharField 选择中使用？

Python(几乎没问题): string splitter