python - 打印包含和排除停用词的文本中 10 个最常出现的词

标签 python nltk word-frequency find-occurrences

我从 here 得到了问题随着我的改变。我有以下代码:

from nltk.corpus import stopwords
def content_text(text):
    stopwords = nltk.corpus.stopwords.words('english')
    content = [w for w in text if w.lower() in stopwords]
    return content

如何打印文本中 1)包括和 2)排除停用词的 10 个最常出现的词？

最佳答案

nltk中有一个FreqDist函数

import nltk
allWords = nltk.tokenize.word_tokenize(text)
allWordDist = nltk.FreqDist(w.lower() for w in allWords)

stopwords = nltk.corpus.stopwords.words('english')
allWordExceptStopDist = nltk.FreqDist(w.lower() for w in allWords if w not in stopwords)

提取 10 个最常见的:

mostCommon= allWordDist.most_common(10).keys()

关于python - 打印包含和排除停用词的文本中 10 个最常出现的词，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/28392860/

上一篇：python - Django 查询集过滤器文件字段不为空

下一篇：python - 为什么在运行 Locust 时出现 403 错误？

python - Unicode解码错误: 'utf8' codec can't decode byte 0xc3 in position 0: unexpected end of data

python-2.7 - 将句子列表中的单词标记为 Python

python - PyCharm:无法查看 Pandas 数据框的子类

python - 如何计算每日用户差异并 reshape pandas 数据框？

python - 为什么 Jenkins 捕获的标准输出忽略控制台输出上的换行符？

nlp - 将多个句子合二为一

search - 计算lucene索引中的词频

java - 是否有 O(N) 解决方案来获取 List<String> 中出现次数最多的前 k 个字符串？

Vim、词频函数和法语口音