python - 如何计算python中文件中的词频

我有一个具有以下格式的 .txt 文件，

C
V
EH
A
IRQ
C
C
H
IRG
V

虽然显然它比那个大很多，但本质上就是它。基本上我试图总结每个单独的字符串在文件中的次数(每个字母/字符串都在单独的行上，所以从技术上讲文件是C\nV\nEH\n 等。但是，当我尝试将这些文件转换为列表，然后使用计数函数时，它会分隔出字母，以便诸如 'IRQ' 之类的字符串是 ['\n'I', 'R','Q','\n'] 所以当我计算它时，我得到每个字母的频率而不是字符串的频率。

这是我到目前为止编写的代码，

def countf():
    fh = open("C:/x.txt","r")
    fh2 = open("C:/y.txt","w")
    s = []
    for line in fh:
        s += line
    for x in s:
        fh2.write("{:<s} - {:<d}".format(x,s.count(x))

我最终想要的是一个看起来像这样的输出文件

C  10
V  32
EH 7
A  1
IRQ  9
H 8

最佳答案

使用Counter() , 并使用 strip() 删除 \n:

from collections import Counter
with open('x.txt') as f1,open('y.txt','w') as f2:
    c=Counter(x.strip() for x in f1)
    for x in c:
        print x,c[x]   #do f2.write() here if you want to write them to f2

输出:

A 1
C 3
EH 1
IRQ 1
V 2
H 1
IRG 1

关于python - 如何计算python中文件中的词频，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/12117576/

上一篇：python - 组合字符串格式

下一篇：python - 我如何在理论上对秒表进行编程？

python - 如何在Python中使用asyncio.wait_for到run_until_complete同步调用异步方法

python - 传递/返回 Cython Memoryviews 与 NumPy 数组

python - 如何根据第一列(python)中的日期将 csv 文件拆分为相应的 csv 文件？

python - 如何将错误消息从 shell 脚本传递到 Python 脚本？

Python 3.4.1 make test 失败 : ERROR: test_connect_starttls (test. test_smtpnet.SmtpTest)

python - 限制列表元素的类型

python - C 中的声明考虑了 C++ 中的定义

python - 为什么python多线程在macos上像单线程一样运行？

python - 使用 pytorch 和多处理在 CPU 上运行推理