python - 在python中查找文本文件中每个单词的频率

我想找到我的文本文件中所有单词的频率，以便我可以从中找出最常出现的单词。有人可以帮我提供用于该命令的命令吗？

import nltk
text1 = "hello he heloo hello hi " // example text
 fdist1 = FreqDist(text1)

我用过上面的代码，但问题是它没有给出词频，而是显示每个字符的频率。我也想知道如何使用文本文件输入文本。

最佳答案

我看到你在使用这个例子并且看到了你看到的同样的东西，为了让它正常工作，你必须用空格分割字符串。如果您不这样做，它似乎会计算每个字符，这就是您所看到的。这会返回每个单词的正确计数，而不是字符。

import nltk

text1 = 'hello he heloo hello hi '
text1 = text1.split(' ')
fdist1 = nltk.FreqDist(text1)
print (fdist1.most_common(50))

如果你想从文件中读取并获取字数，你可以这样做:

输入.txt

hello he heloo hello hi
my username is heinst
your username is frooty

python代码

import nltk

with open ("input.txt", "r") as myfile:
    data=myfile.read().replace('\n', ' ')

data = data.split(' ')
fdist1 = nltk.FreqDist(data)
print (fdist1.most_common(50))

关于python - 在python中查找文本文件中每个单词的频率，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/29052393/

上一篇：python 将列表写入文件

下一篇：python - 如何在类中迭代 `dict` 就像只引用 `dict` 一样？

相关文章：

python - 准确测试 Pypy 与 CPython 的性能

python - 从 Python 字符串中去除标点符号

python - 如何创建滑动窗口生成器 python 3.3

Python 3 错误处理同时运行 try 和 except

python - 删除特殊引号和其他字符

machine-learning - 如何训练以 pos-tag 序列为特征的朴素贝叶斯分类器？

python - Cron 运行 Python 脚本 : Permission Errors

python - 初始化子类中在父类中使用的属性

python - NLTK/NLP 构建多对多/多标签主题分类器

python - 优化嵌套 for 循环